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(54) Method of describing object region data, apparatus for generating object region data, video 
processing apparatus and video processing method 



(57) A region data describing method for describing, 
over a plurality of frames, region data about the region 
of an arbitrary object in a video, the method specifying 
the object region in the video with at least either of an 
approximate figure approximating the region or charac- 
teristic points of the region, approximating a trajectory 
obtained by arranging position data of the representa- 
tive points or the characteristic point in a direction in 
which frames proceed with a predetermined function 
and describing the parameter of the function as region 
data. Thus, the region of a predetermined object in the 
video can be described with a small quantity of data. 
Moreover, creation and handling of data can easily be 
performed. 
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Description 

[0001] The present invention relates to a method of describing object region data such that information about an 
object region in a video is described, an apparatus for generating object region data such that information about an 

5 object region in a video is generated, a video processing apparatus arranged to be given an instruction about an object 
in a video to perform a predetermined process or retrieve an object in a video, and a video processing method therefor. 
[0002] Hyper media are configured such that related information called a hyper link is given in between mediums, 
such as videos, sounds or texts, to permit mutual reference. When videos are mainly used, related information has 
been provided for each object which appears in the video. When the object is specified, related information (text infor- 

w mation or the like) is displayed. The foregoing structure is a representative example of the hyper media. The object in 
the video is expressed by a frame number or a time stamp of the video, and information for identifying a region in the * 
video which are recorded in video data or recorded as individual data. 

[0003] Mask images have frequently been used as means for identifying a region in a video. The mask image is a 
bit map image constituted by giving different pixel values between the inside portion of an identified region and the otit- 
is side portion of the same. A simplest method has an arrangement that a pixel value of "1" is given to the inside portion 
of the region and n 0" is given to the outside portion of the sane. Alternatively, a values which are employed in computer 
graphics are sometimes employed. Since the a value is usually able to express 256 levels of gray, a portion of the levels 
is used. The inside portion of the specified region is expressed as 255, while the outside portion of the same is 
expressed as 0. The latter image is called an a map. When the regions in the image are expressed by the mask images, 
20 determination whether or not a pixel in a frame is included in the specified region can easily be made by reading the 
value of the pixel of the mask image and by determining whether the value is 0 or 255. The mask image has freedom 
with which a region can be expressed regardless of the shape of the region and even a discontinuous region can be 
expressed. The mask image must have pixels, the size of which is the same as the size of the original image. Thus, 
there arises a problem in that the quantity of data cannot be reduced. 
25 [0004] To reduce the quantity of data of the mask image, the mask image is frequently compressed. When the 
mask image is a binary mask image constituted by 0 and 1 , a process of a binary image can be performed. Therefore, 
the compression method employed in facsimile machines or the like is frequently employed. In the case of MPEG-4 in 
which ISO/IEC MPEG (Moving Picture Experts Group) has been standardized, an arbitrary shape coding method will 
be employed in which the mask image constituted by 0 and 1 and the mask image using the a value are compressed. 
30 The foregoing compression method is a method using motion compensation and capable of improving compression 
efficiency. On the other hand, complex compression and decoding processes are required. 
[0005] To express a region in a video, the mask image or the compressed mask image has usually been employed. 
However, data for identifying a region is required to permit easy and quick extraction, to be reduced in quantity and to 
permit easy handling. 

35 [0006] On the other hand, the hyper media, which are usually assumed that an operation for displaying related 
information of a moving object in a video is performed, have somewhat difficulty in specifying the object as distinct from 
handling of a still image. A user usually has difficulty in specifying a specific portion. Therefore, it can be considered 
that the user usually aims, for example, a portion in the vicinity of the center of the object in a rough manner. Moreover, 
a portion adjacent to the object which is deviated from the object is frequently specified according to the movement of 

40 the object. Therefore, data for specifying a region is desired to be adaptable to the foregoing media. Moreover, an aiding 
mechanism for facilitating specification of a moving object in a video is required for the system for displaying related 
information of the moving object in the video. 

[0007] As described above, the conventional method of expressing a desired object region in a video by using the 
mask image suffers from a problem in that the quantity of data cannot be reduced. The method arranged to compress 
45 the mask image raises a problem in that coding and decoding become too complicated. What is worse, directly access- 
ing to the pixel of a predetermined frame cannot be performed, causing handling to become difficult. 
[0008] There arises another problem in that a device for permitting a user to easily instruct a moving object in a 
video has not been provided. 

[0009] Accordingly, it is an object of the present invention to provide a method of describing object region data and 
so an apparatus far generating object region data which are capable of describing a desired object region in a video by 
using a small quantity of data and facilitating generation of data and handling of the same. 

[001 0] Another object of the present invention is to provide a method of describing object region data, an apparatus 
for generating object region data, a video processing method and a video processing apparatus with which a user is 
permitted to easily instruct an object in a video and determine the object. 
55 [001 1 ] Another object of the present invention is to provide a method of describing object region data, an apparatus 
for generating object region data, a video processing method and a video processing apparatus with which retrieval of 
an object in a video can easily be performed. 

[001 2] According to one aspect of the present invention, there is provided a method of describing object region data 
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such that information about an arbitrary object region in a video is described over a plurality of continuous frames, the 
method identifying a desired object region in a video according to at least either of a figure approximated to the object 
region or a characteristic point of the object region; approximating a trajectory obtained by arranging positions of rep- 
resentative points of the approximate figure or the characteristic points of the object region in a direction in which frames 
5 proceed with a predetermined function; and describing information about the object region by using the parameter of 
the function. 

[001 3] According to another aspect of the present invention, there is provided a method of describing object region 
data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, 
the method describing the object region data by using information capable of identifying at least the frame number of a 

10 leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the 
leading frame and the time stamp of the trailing frame, information for identifying the type of the figure of an approximate 
figure approximating the object region, and the parameter of a function with which a trajectory obtained by arranging 
position data of representative points of the approximate figure corresponding to the object region in a direction in which 
frames proceed has been approximated. 

is [001 4] According to another aspect of the present invention, there is provided a method of describing object region 
data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, 
the method describing the object region data by using information capable of identifying at least the frame number of a 
leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the 
leading frame and the time stamp of the trailing frame, the number of approximate figures approximating the object 

20 region, information for identifying the type of the figure of an approximate figure and the parameters of functions with 
which trajectories corresponding to the approximate figures and obtained by arranging position data of representative 
points of each approximate figure in a direction in which frames proceed have been approximated. 
[001 5] According to another aspect of the present invention, there is provided a method of describing object region 
data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, 

25 the method describing the object region data by using information capable of identifying at least the frame number of a 
leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the 
leading frame and the time stamp of the trailing frame, and the parameter of a function with which a trajectory obtained 
by arranging position data of characteristic points of the object region in a direction in which frames proceed has been 
approximated. 

30 [0016] Information capable of identifying the frame number of a leading frame and the frame number of a trailing 
frame of the plurality of the subject frames or the time stamp of the leading frame and the time stamp of the trailing 
frame is the leading frame number and a trailing frame number or the leading frame number and the difference between 
the leading frame number and the trailing frame number. 

[001 7] The parameter of the function may be position data of knots of the trajectory and information arranged to be 
35 used together with the position data of the knots to be capable of identifying the trajectory. Alternatively, the parameter 
of the function may be a coefficient of the function. 

[0018] When a plurality of representative points of the approximate figure of the object region or characteristic 
points of the object region exist, it is desirable to identify the correspondence between the plural representative points 
or the characteristic points of the present frame and a plurality of representative points or characteristic points of an 
40 adjacent frame. 

[0019] It is desirable to describe information related to the object or a method of accessing to the related informa- 
tion. 

[0020] According to another aspect of the present invention, there is provided a recording medium storing object 
region data containing information about regions of one or more objects described by one of the above methods. 

45 [0021] According to another aspect of the present invention, there is provided a recording medium storing object 
region data containing information about regions of one or more objects described by one of the above methods and 
information related to each object or information indicating a method of accessing to the related information. 
[0022] According to another aspect of the present invention, there is provided a recording medium storing object 
region data containing information about regions of one or more objects described by one of the above methods and 

so information for identifying information related to each object, and information related to each object. 

[0023] According to another aspect of the present invention, there is provided a video processing method for deter- 
mining whether or not a predetermined object has been specified in a screen which is displaying a video, the method 
obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data 
of representative points of the approximate figure in a direction in which frames proceed when an arbitrary position has 

55 been specified in the screen in a case where a region of the predetermined object exists in the video; detecting the posi- 
tion of the representative point in the frame based on the obtained information; detecting the position of the approximate 
figure in accordance with the detected position of the representative point; determining whether or not the input position 
exists in the approximate figure; and determining that the predetermined object has been specified when a determina- 
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tion has been made that the input position exists in the approximate figure. 

[0024] According to another aspect of the present invention, there is provided a video processing method for deter- 
mining whether or not a predetermined object has been specified in a screen which is displaying a video, the method 
obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data 

5 of characteristic points of the object region in a direction in which frames proceed when an arbitrary position has been 
specified in the screen in a case where a region of the predetermined object exists in the video; detecting the positions 
of the characteristic points in the frame in accordance with the obtained information; determining whether or not the dis- 
tance between the input position and the detected position of the characteristic point is shorter than a reference value; 
and determining that the predetermined object has been specified when a determination has been made that the dis- 

w tance is shorter than the reference value. 

[0025] When a determination has been made that the predetermined object has been specified, it is desirable to 
show information related to the predetermined object. 

[0026] According to another aspect of the present invention, there is provided a video processing method of dis- 
playing a region in which a predetermined object exists when the predetermined object has been specified in a screen 

is which is displaying a video, the video processing method obtaining information describing parameter of a function 
approximating a trajectory obtained by arranging position data of at least representative points of an approximate figure 
of the object region or characteristic points of the object region in a direction in which frames proceed when the region 
of the predetermined object exists in the video; detecting the representative point or the characteristic point in the frame 
in accordance with the obtained information; and displaying information for displaying the position of the object region 

20 in the screen in a predetermined form of display in accordance with the detected representative point or the character- 
istic point. 

[0027] According to another aspect of the present invention, there is provided a video processing method for 
retrieving a predetermined object among objects which appears in a video and which satisfies a predetermined condi- 
tion, the video processing method inputting an arbitrary position in the video and a retrieving condition determined in 

25 accordance with the input position; obtaining information describing parameter of a function approximating a trajectory 
obtained by arranging position data of representative points of an approximate figure of an object region produced for 
each object which appears in the video or a characteristic point of the object region in a direction in which frames pro- 
ceed; determining, for each object over a plurality of frames, whether or not the representative point of the approximate 
figure or the characteristic point and the input position have a predetermined relationship in one frame of one object 

30 obtained in accordance with the obtained information; and detecting the predetermined object satisfying the retrieving 
condition in accordance with a result of determination. 

[0028] The predetermined relationship may be the relationship that the input position exists in the approximate fig- 
ure region or the relationship that the distance from the characteristic point to the input position is shorter than a refer- 
ence value. The retrieving condition may be a condition of an object which is to be extracted, which is selected from a 
35 retrieval condition group consisting of a condition that at least one frame satisfying the predetermined relationship exists 
at the input position, a condition that the predetermined number of frames each satisfying the predetermined relation- 
ship exists successively with regard to the input position and a condition that the predetermined relationship is not sat- 
isfied in ail of the frames. 

[0029] The retrieval condition group includes, as a condition which must be added to the condition which is deter- 
40 mined in accordance with the position, an attribute condition which must be satisfied by the approximate figure of the 
object. 

[0030] According to another aspect of the present invention, there is provided a video processing method for 
retrieving a predetermined object among objects which appears in a video and which satisfies a predetermined condi- 
tion, the video processing method inputting information for specifying a trajectory of the position in a video which is to 

45 be retrieved; obtaining information describing parameter of a function approximating a trajectory obtained by arranging 
position data of representative points of an approximate figure of the object region produced for each object which 
appears in a video and which is to be retrieved or a characteristic point of the object region in a direction in which frames 
proceed; evaluating, for each object, similarity of the trajectory of the representative point or the characteristic point of 
the one object detected in accordance with the obtained information and the trajectory of the input position; and detect- 

50 ing the predetermined object corresponding to the specified trajectory. 

[0031 ] Information for specifying the trajectory of the position may be time sequence information including the rela- 
tionship between the position and time. The similarity may be evaluated while the positional relationship is being added. 
[0032] The specified trajectory may be a trajectory of an object in a video which has been specified. Alternatively, 
a user may be permitted to input the trajectory by drawing the trajectory on a GUI. 

55 [0033] According to another aspect of the present invention, there is provided an object-region-data generating 
apparatus for generating data about described information of a region of an arbitrary object in a video over a plurality of 
continuous frames, the object-region-data generating apparatus comprising a circuit configured to approximate an 
object region in the video in a plurality of the subject frames by using a predetermined figure; a detector configured to 
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detect, in the plural frames, coordinate values of the predetermined number of representative points identifying the pre- 
determined figure which has been used in the approximation; and a circuit configured to approximate a trajectory of a 
time sequence of the coordinate values of the representative points obtained over the plurality of the continuous frames 
with a predetermined function, so that information about the object region is generated by using the parameter of the 
5 function. 

[0034] According to another aspect of the present invention, there is provided an object-region-data generating 
apparatus for generating data about described information of a region of an arbitrary object in a video over a plurality of 
continuous frames, the object- region-data generating apparatus comprising a detector configured to detect the coordi- 
nate values of the predetermined number of characteristic points of an object region in a video over the plurality of the 
10 subject frames, and a circuit configured to approximate a time sequential trajectory of the coordinate values of the char- 
acteristic points obtained over the plurality of the continuous frames with a predetermined function, wherein the param- 
eter of the function is used to generate information about the object region. 

[0035] According to another aspect of the present invention, there is provided a video processing apparatus for per- 
forming a predetermined process when a predetermined object has been specified in a screen which is displaying a 

is video, the video processing apparatus comprising a circuit configured to obtain a parameter of a function approximating 
a trajectory obtained by arranging position data of representative points of an approximate figure of the object region in 
a direction in which frames proceed in a case where a region of a predetermined object exists in the video when an arbi- 
trary position has been specified in the screen to detect the position of the representative point in the frame; a detector 
configured to detect the position of the approximate figure in accordance with the detected position of the representa- 

20 five point; and a circuit configured to determine whether or not the input position exists in the approximate figure. 
[0036] According to another aspect of the present invention, there is provided a video processing apparatus for per- 
forming a predetermined process when a predetermined object has been specified in a screen which is displaying a 
video, the video processing apparatus comprising a circuit configured to obtain a parameter of a function approximating 
a trajectory obtained by arranging position data of a characteristic point of the object region in a direction in which 

25 frames proceed in a case where the region of the predetermined object exists in the video when arbitrary position has 
been specified in the screen to detect the position of the characteristic point in the frame; and a circuit configured to 
determine whether or not the distance between the input position and the detected position of the characteristic point 
is shorter than a reference value. 

[0037] According to another aspect of the present invention, there is provided a video processing apparatus for per- 
30 forming a predetermined process when a predetermined object has been is specified in a screen which is displaying a 
video, the video processing apparatus comprising a circuit configured to obtain a parameter of a function approximating 
a trajectory obtained by arranging position data of at least a representative point of an approximate figure of the object 
region or a characteristic point of the object region in a direction in which frames proceed when the region of the pre- 
determined object exists in the video to detect the representative point or the characteristic point in the frame; and a 
35 circuit configured to display information for indicating the position of the object region in the screen in a predetermined 
display form. 

[0038] According to another aspect of the present invention, there is provided a video processing apparatus for 
retrieving a predetermined object among objects which appears in a video and which satisfies an specified condition, 
the video processing apparatus comprising a circuit configured to obtain information describing parameter of a function 

40 approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the 
object region produced for each object which appears in a video which is to be retrieved or a characteristic point of the 
object region in a direction in which frames proceed when an arbitrary position in the video which is to be retrieved and 
a retrieving condition determined in accordance with the position have been input; a circuit configured to determine, for 
each object over a plurality of the frames, whether or not the approximate figure or the characteristic point of one object 

45 in one frame obtained in accordance with the obtained information and the input position satisfy a predetermined rela- 
tionship; and a detector configured to detect an object which satisfies the retrieving condition in accordance with a result 
of the determination. 

[0039] According to another aspect of the present invention, there is provided a video processing apparatus for 
retrieving a predetermined object among objects which appears in a video and which satisfies an specified condition, 

so the video processing apparatus comprising a circuit configured to obtain information describing parameter of a function 
approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the 
object region produced for each object which appears in the video which is to be retrieved or a characteristic point of 
the object region in a direction in which frames proceed when information for specifying a trajectory of the position in a 
video which is to be retrieved has been input; a circuit configured to evaluate, for each object, similarity between the 

55 trajectory of the representative point or the characteristic point of one object obtained in accordance with the obtained 
information and the trajectory of the input position; and a detector configured to detect the predetermined object corre- 
sponding to the specified trajectory in accordance with the evaluated similarity. 

[0040] Note that the present invention relating to the apparatus may be employed as the method and the present 
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invention relating to the method may be employed as the apparatus. 

[0041] The present invention relating to the apparatus and the method may be employed as a recording medium 
which stores a program for causing a computer to perform the procedure according to the present invention (or causing 
the computer to serve as means corresponding to the present invention or causing the computer to realize the function 

5 corresponding to the present invention) and which can be read by the computer. 

[0042] The present invention is configured such that the object region in a video over a plurality of frames is 
described as a parameter of a function approximating a trajectory obtained by arranging position data of representative 
paints of an approximate figure of the object region or a characteristic point of the object region in a direction in which 
frames proceed. Therefore, the object region in the video over the plural frames can be described with a small quantity 

w of the function parameters. Hence it follows that the quantity of data required to identify the object region can effectively 
be reduced. Moreover, handling can be facilitated. Moreover, extraction of a representative point or a characteristic 
point from the approximate figure or generation of the parameter of the approximate curve can easily be performed. 
Moreover, generation of an approximate figure from the parameter of the approximate curve can easily be performed. 
[0043] When the representative point of the approximate figure is employed, a fundamental figure, for example, one 

15 or more ellipses, are employed such that each ellipse is represented by two focal points and another point. Thus, 
whether or not arbitrary coordinates specified by a user exist in the object region (the approximate figure) can be deter- 
mined by using a simple discriminant. Hence it follows that the user is able to easily instruct a moving object in a video. 
[0044] When the characteristic point is employed, whether or not the arbitrary coordinates specified by a user indi- 
cates the object region can considerably easily be determined. Thus, a moving object in a video can easily be specified 

20 by the user. 

[0045] When display of an object region among regions of objects which can be identified by using object region 
data and which has related information, or display of an image indicating the object region is controlled, the user is per- 
mitted to quickly recognize whether or not related information exists and the position of the object region. Therefore, the 
operation which is performed by the user can effectively be aided. 
25 [0046] According to the present invention, retrieval of an object in a video can easily be performed in accordance 
with a position in a video through which the object passes, residence time at a certain point or a trajectory. 
[0047] This summary of the invention does not necessarily describe all necessary features so that the invention 
may also be a sub-combination of these described features. 

[0048] The invention can be more fully understood from the following detailed description when taken in conjunction 
30 with the accompanying drawings, in which: 

FIG. 1 is a diagram showing an example of the structure of an object-region-data generating apparatus according 
to a first embodiment of the present invention; 

FIGS. 2A, 2B, 2C and 2D are diagrams showing a procedure for describing an object region in a video with object 
35 region data; 

FIG. 3 is a diagram showing an example of a process for approximating an object region with an ellipse; 

FIG. 4 is a diagram showing an example of a process for detecting a representative point of an approximate ellipse 

of an object region; 

FIG. 5 is a diagram showing an example of the structure of object region data; 
40 FIG. 6 is a diagram showing an example of the structure of data of an approximate figure in object region data; 

. FIG. 7 is a diagram shoving an example of the structure of data of a trajectory of a representative point in data of 
an approximate figure; 

FIG. 8 is a diagram showing an example of representative points when the approximate figure is a parallelogram; 
FIG. 9 is a diagram showing an example of representative points when the approximate figure is a polygon; 
45 FIG. 10 is a flowchart showing an example of a procedure according to the first embodiment of the present inven- 
tion; 

FIG. 1 1 is a diagram showing an example in which the object region in a video is expressed with a plurality of 
ellipses; 

FIG. 12 is a diagram showing an example of the structure of object region data including data of a plurality of 
so approximate figures; 

FIGS. 13A, 13B and 13C are diagrams schematically shoving another process for describing an object region in a 
video with object region data; ' 

FIG. 14 is a flowchart showing an example of a procedure for obtaining an approximate rectangle; 
FIG. 15 is a diagram showing a state in which an inclined and elongated object is approximated with a non-inclined 
55 rectangle; 

FIGS. 16A and 16B are diagrams showing a state in which an object has been approximated with a rectangle hav- 
ing an inclination corresponding to the inclination of the object; 

FIG. 17 is a flowchart showing another example of a procedure for obtaining the approximate rectangle; 
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FIG. 18 is a diagram shewing a method of obtaining an approximate ellipse from an approximate rectangle; 
FIG. 19 is a flowchart showing an example of a procedure for obtaining an approximate ellipse from an approximate 
rectangle; 

FIG. 20 is a diagram showing a method of making representative points of approximate figures to correspond to 
one another between adjacent frames; 

FIG. 21 is a flowchart showing an example of a procedure for making representative points of approximate figures 
to correspond to one another between adjacent frames; 

FIG. 22 is a diagram showing another example of the structure of object region data; 

FIG. 23 is a diagram showing an example of the correspondence among the ID of types of figures, the type of the 
figures and the number of representative points; 

FIG. 24 is a diagram showing an example of the correspondence among the ID of a function, the form of the func- 
tion and the function parameter and the limit condition; 

FIG. 25 is a diagram showing a specific example of the structure of data about related information; 
FIG. 26 is a diagram showing another specific example of the structure of data about related information; 
FIG. 27 is a diagram showing an example of an object- region-data generating apparatus according to a second 
embodiment of the present invention; 

FIG. 28 is a flowchart showing an example of a procedure according to the second embodiment; 

FIG. 29 is a diagram showing an example of the structure of a video processing apparatus according to a third 

embodiment of the present invention; 

FIG. 30 is a flowchart showing an example of a procedure according to the third embodiment; 
FIG. 31 is a diagram showing an example of display of contents hyrjer media which uses object region data; 
FIG. 32 is a flowchart showing another example of the procedure according to the third embodiment; 
FIG. 33 is a flowchart showing an example of a procedure according to a fourth embodiment of the present inven- 
tion; 

FIGS. 34A and 34B are diagrams showing an example of change in the display of an object region having related 
information; 

FIG. 35 is a diagram showing another example of the display of the position of an object region having related infor- 
mation; 

FIG. 36 is a diagram showing another example of the display of the position of an object region having related infor- 
mation; s 
FIG. 37 is a diagram showing an example of display of a description list of an object region having related informa- 
tion; 

FIG. 38 is a diagram showing an example of display of an object region having related intonation with an icon; 
FIG. 39 is a diagram of an example of display of an object region having related information with a map; 
FIGS. 40A and 40B are diagrams showing an example of control of an image reproducing rate for facilitating 
instruction of an object region; 

FIG. 41 is a diagram showing an example which enables image capture for facilitating instruction of an object 
region; 

FIG. 42 is a flowchart showing an example of a procedure according to a fifth embodiment of the present invention; 
and 

FIG. 43 is a flowchart showing another example of the procedure according to the fifth embodiment. 

[0049] A preferred embodiment of an object-region-data generating apparatus according to the present invention 
will now be described with reference to the accompanying drawings. 

First Embodiment 

[0050] FIG. 1 is a block diagram showing the structure of a first embodiment of the present invention. As shown in 
FIG. 1, an object-region-data generating apparatus incorporates a video data storage portion 100, a region extracting 
portion 1 01 , a region figure approximating portion 1 02 for approximating a region with a figure, a figure-representative- 
point extracting portion 103, a representative point trajectory curve approximating portion 104 for approximating repre- 
sentative points with a curve, a related information storage portion 105 and a region data storage portion 106. A case 
will now be described in which the process according to this embodiment (in particular, processes arranged to be per- 
formed by the region extracting portion 101 or the region figure approximating portion 102) is configured such that the 
operation which is performed by a user is permitted. In the foregoing case, the GUI (not shown in FIG. 1) is employed 
with which video data is displayed in, for example, frame units to permit input of an instruction from the user. 
[0051] The video data storage portion 100 stores video data and comprises, for example, a hard disk, an optical 
disk or a semiconductor memory. 
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[0052] The region extracting portion 101 extracts a portion of regions of video data. The portion of the regions are 
regions of an object, such as a specific person, a vehicle or a building (as an alternative to this, a portion of the object, 
for example, the head of a person, the bonnet of a vehicle or the front door of a building) in the video. Usually a video 
has the sane object in the continuous frames thereof. The region corresponding to the same object frequently changes 

5 owing to the movement of the object or shaking of a camera during an image pick-up operation. 

[0053] The region extracting portion 101 extracts an object region in each frame corresponding to the movement or 
deformation of the object of interest. Specifically, the extraction is performed by a method of manually specifying a 
region in each of all of the frames. Another method may be employed with which the contour of an object is continuously 
extracted by using an active contour model called "Snakes" as disclosed in "Snakes: Active contour models" (Interna- 

w tional journal of Computer Vision, vol. 1 , No. 4, pp. 321 -331 , July, 1 988 disclosed by M. Kass et al.). Also a method dis- 
closed in "Method of tracing high-speed mobile object for producing hyper media contents by using robust estimation". 
' (CVIM 113-1, 1998, technical report of Information Processing Society of Japan) may be employed. According to the 
disclosure, deformation and movement of the overall body of an object are estimated in accordance with a position to 
which a partial object region has been moved and which has been detected by performing block matching. Alternatively, 

15 a method of identifying a region having similar colors by performing growing and division of a region as disclosed in 
Image Analysis Handbook (Chapter-2, Section II, Publish Conference of Tokyo University. 1991) may be employed. 
[0054] The region figure approximating portion 102 approximates an object region in a video extracted by the 
region extracting portion 101 with a predetermined figure. The figure may be an arbitrary figure, such as a rectangle, a 
circle, an ellipse or a polygon. Also a method of approximating a region may be a method of performing approximation 

20 to a figure circumscribing the region. Another method of performing approximation to a figure inscribing the region may 
be employed or a method may be employed which is arranged such that the centroid of the region is employed as the 
centroid of the approximate figure. Another method of making the area ratio of the region and the approximate figure to 
be the same may be employed. As an alternative to the approximation of the object region with a predetermined type 
figure, the type of the figure may be specified by a user for each object to which approximation is performed. Another 

25 method may be employed with which the type of the figure is automatically selected in accordance with the shape of 
the object or the like for each of the object to which approximation is performed. 

[0055] The approximation of the region with the figure is performed for each frame whenever a result of extraction 
performed by the region extracting portion 101 is input. Alternatively, approximation with a figure may be performed by 
using a result of extraction of a plurality of preceding and following frames. When the result of extraction of the plural 

30 frame is employed, change in the size and position of the approximate figure is smoothed among the plural frames so 
that the movement and deformation of the approximate figure are smoothed or an error in the extraction of the region 
is made to be inconspicuous. Note that the size of the approximate figure may vary among the frames. 
[0056] The figure-representative-point extracting portion 103 extracts representative points of the approximate fig- 
ure which is an output of the region figure approximating portion 102. The point which is employed as the representative 

35 point varies according to the type of the employed approximate figure. When the approximate figure is formed into, for 
example, rectangle, the four or three vertices of the rectangle may be the representative points. When the approximate 
figure is formed into a circle, the representative points may be the center and one point on the circumference or two end 
points of the diameter. When the approximate figure is an ellipse, the representative points may be the vertex of a cir- 
cumscribed rectangle of the ellipse or the two focal points and one point on the ellipse (for example, one point on the 

40 minor axis). When an arbitrary closed polygon is the approximate figure, the vertices may be the representative points 
of the figure. 

[0057] The representative points are extracted in frame units whenever information about the approximate figure for 
one frame is output from the region figure approximating portion 102. Each representative point is expressed by the 
coordinate axis in the horizontal (X) direction and the coordinate axis in the vertical (Y) direction. 

45 [0058] The representative point trajectory curve approximating portion 104 time-sequentially approximates the 
sequence of the representative points extracted by the f igure-representative-poirrt extracting portion 1 03 to a curve. The 
approximate curve is, for each of the X coordinate and Y coordinate of each representative point, expressed as a func- 
tion of the frame number f or time stamp t given to the video. The approximation with the curve may be approximation 
with a straight line or approximation with a spline curve. 

so [0059] The related information storage portion 105 stores information (as an alternative to this, information about 
the address at which related information stored in another storage apparatus, for example, Internet or a server on a 
LAN) relating to the object which appears in video data stored in the video data storage portion 100. Related informa- 
tion may be a character, voice, a still image, a moving image or their combination. Alternatively, related information may 
be data describing the operation of a program or a computer. Similarly to the video data storage portion 100, the related 

55 information storage portion 105 comprises a hard disk, an optical disk or a semiconductor memory. 

[0060] The region data storage portion 106 is a storage medium in which object region data is stored which 
includes data for expressing a formula of the curve approximating the time-sequential trajectory of the representative 
points which is the output of the representative point trajectory curve approximating portion 104. When related informa- 
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tion about the object corresponding to the region expressed by a function has been stored in the related information 
storage portion 105, object region data may include related information and the address at which related information 
has been recorded. When information of the address of recorded related information has been stored in the related 
information storage portion 105, also address information may be recorded. Similarly to the video data storage portion 
5 100 and the related information storage portion 105, the region data storage portion 106 comprises a hard disk, an opti- 
cal disk or a semiconductor memory. 

[0061 ] The video data storage portion 1 00, the related information storage portion 1 05 and the region data storage 
portion 106 may be constituted by individual pieces of storage apparatus. Alternatively, the overall portion or a portion 
may be constituted by one storage apparatus. 
w [0062] The object-region-data generating apparatus may be realized by a software which is operated on a compu- 
ter. 

[0063] The operation of the object-region-data generating apparatus will specifically be described. 
[0064] FIGS. 2A, 2B, 2C and 2D are diagrams more specifically showing a sequential process. The sequential 
process includes a process which is performed by the region extracting portion 101 to extract the object region. More- 
15 over, a process which is performed by the region figure approximating portion 102 to approximate the region and a proc- 
ess which is performed by the figure-representative-point extracting portion 103 to extract a representative point of a 
figure are included. Also a process which is performed by the representative point trajectory curve approximating por- 
tion 104 to approximate the representative point trajectory with a curve is included. 

[0065] In this case, the region figure approximating portion 1 02 employs a method of approximating the region with 
20 an ellipse. The figure-representative-point extracting portion 1 03 employs a method of extracting the two focal points of 
the ellipse and one point on the ellipse. The representative point trajectory curve approximating portion 104 employs a 
method of approximating the sequence of the representative points with a spline function. 

[0066] Referring to FIG. 2 A, reference numeral 200 represents a video of one frame which is to be processed. Ref- 
erence numeral 201 represents the object region which is to be extracted. A process for extracting the object region 201 
25 is performed by the region extracting portion 101. Reference numeral 202 represents an ellipse which is a result of 
approximation of the object region 201 with an ellipse. A process for obtaining the ellipse 202 from the object region 201 
is performed by the region figure approximating portion 1 02. 

[0067] FIG. 3 shows an example of the method of obtaining an approximate ellipse when the object region is 
expressed by a parallelogram. Points A, B, C and D shown in FIG. .3 are vertices of the parallelogram which is the object 
30 region. In the foregoing case, calculations are performed so that which side AB or side BC is a longer side is deter- 
mined. Then, a smallest rectangle having portions of its sides which are the longer side and its opposite side is deter- 
mined. In the case shown in FIG. 3, a rectangle having four points A, B\ C and D' is the smallest rectangle. The 
approximate ellipse is a circumscribing ellipse similar to the ellipse inscribing the rectangle and passing the points A, 
B\ C and D\ 

35 [0068] Referring to FIG. 2B. reference numerals 203 represent representative points of a figure expressing an 
ellipse. Specifically, the representative points are two focal points of the ellipse and one point on the same (one point 
on the. minor axis in the case shown in FIG. 2B). The focal points of the ellipse can easily be determined from points on 
the two axes or a circumscribing rectangle of the ellipse. An example will now be described with which focal points F 
and G are determined from two points P 0 and on the major axis and point H on the minor axis shown FIG. 4. 

40 [0069] Initially, a and b which are parameters of the major axis and the minor axis, center C of the ellipse and eccen- 
tricity £ are determined as follows: 

E(P 0 .Pi) = 2xa 
45 CMPq + P^/2 

E (C, H) = b 
e = (1/a) x V(a xa-b xb) 

50 

where E (P, Q) is the Euclidean distance between the point P and the point Q. In accordance with the determined 
parameters, the focal points F and G can be determined as follows: 

F = C + ex(P 0 -C) 

55 

G=C-ex(P 0 -C) 

[0070] Thus, the representative points F, G and H of the ellipse are determined. When the foregoing points are 



9 



3/29/2007, EAST Version: 2.1.0.14 



EP1 024 667 A2 

made to correspond to the representative points ol the ellipse extracted in another frame, ambiguity is involved. That is, 
two combinations exist which make the two extracted focal points correspond to the two focal points in the previous 
frame. Since two intersections exist between the minor axis and the ellipse, the intersection corresponding to the one 
point on the ellipse extracted in the previous frame cannot be determined. A method of determining the combination 
5 and the intersection will now be described. 

[0071 ] An assumption is made that the two focal points extracted in the previous frame are Fp and Gp. To determine 
F or G which correspond to Fp ( the following comparison is made: 

E ((Gp - Fp)/2, (G - F)/2) and 

10 

E((Gp-Fp)/2,(F-G)/2) 

[0072] When the former focal point is smaller, Pp is made to correspond to F, and Gp is made to correspond to G. 
When the latter focal point is smaller, Fp is made to correspond to G and, Gp is made to correspond to F. 
15 [0073] An assumption is made that the intersections between the minor axis and the ellipse in the previous frame 
are Hp and the intersections between the minor axis of the ellipse in the present frame are H and H\ The point H or H' 
which must be made to correspond to Hp is determined by calculating two distances: 

E (Hp - (Gp + Fp)/2, H - (F + G)/2) and 

20 

E (Hp - (Gp + Fp)/2, H' - (F + G)/2) 

[0074] When the former distance is shorter, H is selected. In a negative case. H* is selected. Note that the intersec- 
tion H between the minor axis and the ellipse in the first frame may be either of the two intersections. 
25 [0075] The foregoing process for extracting the representative points from the ellipse is performed by the figure-rep- 
resentative-point extracting portion 103. 

[0076] The representative points extracted by the foregoing process are usually varied in the position among the 
successive frames owing to movement of the object of interest in the video or shaking of the image pick-up camera. 
Therefore, the corresponding representative points of the ellipses are time-sequentially arranged to perform approxima- 
30 tion with a spline function for each of the X and Y axes. In this embodiment, each of the three points F, G and H (see 
FIG. 4) which are the representative points of the ellipse requires a spline function for the X and Y coordinates. There- 
fore, six spline functions are produced. 

[0077] The approximation to a curve with spline functions is performed by the representative point trajectory curve 
approximating portion 104. 

35 [0078] The process which is performed by the representative point trajectory curve approximating portion 1 04 may 
be carried out whenever the coordinates of the representative points of each frame relating to the object region are 
obtained. For example, the approximation is performed whenever the coordinates of the representative points in each 
frame are obtained. Moreover, an approximation error is obtained to arbitrarily divide the approximation region in such 
a manner that the approximation error satisfies a predetermined range. Another method may be employed with which 

40 the process is performed after the coordinates of the representative points in all of the frames relating to the object 
region have been obtained. 

[0079] Reference numeral 204 shown in FIG, 2C represents the approximated spline function expressed three- 
dimensionally. Reference numeral 205 shown in FIG. 2D represents an example of the spline function which is the out- 
put of the representative point trajectory curve approximating portion 104 (only one axis of coordinate of one represent- 
45 ative point is shown). In this example, the approximation region is divided into two sections (the number of knots is 
three) which are t = 0 to 5 and t = 5 to 16. 

[0080] The thus-obtained spline functions are recorded in the region data storage portion 106 in a predetermined 
data format. 

[0081 ] As described above, this embodiment enables the object region in a video to be described as the parameter 
so of a curve approximating a time-sequential trajectory (a trajectory of the coordinates of the representative points having 
the variable are the frame numbers or the time stamps) of the representative points of the approximate figure of the 
region. 

[0082] According to this embodiment, the object region in a video can be expressed by only the parameters of the 
function. Therefore, object region data, the quantity of which is small and which can easily be handled, can be pro- 
55 duced. Also extraction of representative points from the approximate figure and producing of parameters of the approx- 
imate curve can easily be performed. Moreover, producing of an approximate figure from the parameters of the 
approximate curve can easily be performed. 

[0083] A method may be employed with which a basic figure, for example, one or more ellipses are employed as 
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the approximate figures and each ellipse is represented by two focal points and another point. In the foregoing case, 
whether or not arbitrary coordinates specified by a user exist in the region (the approximate figure) of the object 
(whether or not the object region has been specified) can be determined by a simple determinant. Thus, specification ' 
of a moving object in a video can furthermore easily be performed by the user. 
5 [0084] The data format of object region data which is stored in the region data storage portion 106 will now be 
described. A case will now be described in which the representative points are approximated with a spline function. As 
a matter of course, a case in which the representative points are approximated with another function is performed sim- 
ilarly. 

[0085] FIG. 5 shows an example of the data format of object region data for recording the spline function indicating 
10 the object region in a video and information related to the object. 

[0086] ID number 400 is an identification number which is given to each object. Note that the foregoing data item 
may be omitted. 

[0087] A leading frame number 401 and a trailing frame number 402 are leading and trailing frame numbers for 
defining existence of the object having the foregoing ID number. Specifically, the numbers 401 and 402 are the frame 

15 number at which the object appears in the video and the frame number at which the object disappears. The frame num- 
bers are not required to be the frame numbers at which the object actually appears and disappears in the video. For 
example, an arbitrary frame number after the appearance of the object in the video may be the leading frame number. 
An arbitrary frame number which follows the leading frame number and which precedes the frame of disappearance of 
the object in the video may be the trailing frame number. The leading/trailing time stamp may be substituted for the lad- 

20 ing/trailing frame number. The object existence frame number or object existence time may be substituted for the trailing 
frame number 402, 

[0088] A pointer (hereinafter called a "related information pointer*) 403 for pointing related information is the 
address or the like of the data region in which data of information related to the object having the foregoing ID number. 
When the related information pointer 403 for pointing related information is used, retrieval and display of information 
25 related to the object can easily be performed. The related information pointer 403 for pointing related information may 
be pointer for pointing data of description of a program or the operation of a computer. In the foregoing case, when the 
object has been specified by a user, the computer performs a predetermined operation. 

[0089] Note that the related information pointer 403 for pointing related information may be omitted when the 
objects are nor required to perform individual operations. 

30 [0090] The operation for describing the related information pointer 403 for pointing related information in the object 
region data will now be described. As an alternative to using the pointer 403, related information itself may be described 
in object region data. The related information pointer 403 for pointing related information and related information may 
be described in object region data. In the foregoing case, a flag is required to indicate whether the related information 
pointer for pointing related information or related information has been described in object region data. 

35 [0091] The approximate figure number 404 is the number of the figures approximating the object region. In the 
example shown in FIG. 2A in which the object region is approximated with one ellipse, the number of the figures is 1 . 
[0092] Approximate figure data 405 is data (for example, the parameter of a spline function) of a trajectory of the 
representative point of the figure for expressing an approximate figure. 

[0093] Note that approximate figure data 405 exists by the number corresponding to the approximate figure number 
40 404 (a case where the approximate figure number 404 is two or larger will be described later). 

[0094] The number of the approximate figure number 404 for object region data may always be one (therefore, also 
approximate figure data 405 is always one) to omit the field for the approximate figure number 404. 
[0095] FIG. 6 shows the structure of approximate figure data 405 (see FIG. 5). 

[0096] A figure type ID 1300 is identification data for indicating the type of a figure serving as the approximate fig- 
45 ure, the figure type ID 1300 being arranged to identify a circle, an ellipse, a rectangle and a polygon. 

[0097] A representative point number 1 301 indicates the number of representative points of the figure specified by 
the figure type ID 1300. Note that the number of the representative points is expressed with M. 
[0098] A pair of representative point trajectory data items 1302 and 1303 are data regions relating to the spline 
function for expressing the trajectory of the representative points of the figure. The representative points of one figure 
so require data of one pair of spline functions for the X and Y coordinates. Therefore, data of the trajectory of the repre- 
sentative points for specifying the spline function exists by representative point number (M) x 2. 
[0099] Note that the type of the employed approximate figure may previously be limited to one type, for example, 
an ellipse. In the foregoing case, the field for the figure type ID 1 300 shown in FIG. 6 may be omitted. 
[0100] When the representative point number is defined according to the figure type ID 1300, the representative 
55 point number may be omitted. 

[0101] FIG. 7 shows an example of the structure of representative point trajectory data 1302 and 1303. 

[0102] A knot frame number 1400 indicates the knots of the spline function. Thus, a fact that polynomial data 1403 

is effective to the knots is indicated. The number of coefficient data 1402 of the polynomial varies according to the high- 
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est order of the spline function (assuming that the highest order is K, the number of coefficient data is K + 1). Therefore, 
reference to a polynomial order 1401 is made. Subsequent to the polynomial order 1401 , polynomial coefficients 1402 
by the number corresponding to the polynomial order (K) + 1 follows. 

[0103] Since the spline function is expressed in an individual polynomial among the knots, the polynomials are 
5 required by the number corresponding to the number of knots. Therefore, data 1403 including the knot frame number 
and the coefficient of the polynomial is described repeatedly. When the knot frame number is the same as the trailing 
end frame, it means the trailing end polynomial coefficient data. Therefore, termination of representative point trajectory 
data can be understood. 

[0104] A case will now be described in which a figure except for the ellipse is employed as the approximate figure. 
w [0105] FIG. 8 is diagram showing the representative points in a case where a parallelogram is employed as the 
approximate figure. Points, A, B, C and 0 are vertices of the parallelogram. Since three points of the four vertices are 
determined, the residual one is determined. Therefore, three vertices among the four vertices are required to serve as 
the representative points. In the foregoing example, three points, which are A, B and C, are employed as the represent- 
ative points. 

15 [0106] FIG. 9 is a diagram showing representative points in a case where a polygon is employed to serve as the 
approximate figure. In the case of the polygon, the order of the vertices is made to be the order along the outer surface. 
Since the example shown in FIG. 9 has 10 vertices, ail of the vertices N t to N 10 are employed as the representative 
points. In the foregoing case, the number of the vertices may be reduced by employing only vertices each having an 
internal angle smaller than 180° as the representative points. 

20 [01 07] As described above, the foregoing process may be performed by software which is operated on a computer. 
FIG. 10 is a flowchart showing the process which is performed by the video processing apparatus according to this 
embodiment. When the video processing apparatus according to this embodiment is realized by software, a program 
according to the flowchart shown in FIG. 10 is produced. 

[0108] In step S1 1 , video data for one frame is extracted from the video data storage portion 100. 
25 [0109] In step S12, the region of a predetermined object in the video is extracted. Extraction may be performed by 
a method similar to that employed by the region extracting portion 101 . 

[01 1 0] In step S 1 3, an approximate figure is approximated to region data which is a result of the process performed 
in step SI 2. The approximation method may be similar to that employed by the region figure approximating portion 102. 
[0111] In step S14, the representative points of the figure approximated in step S13 is extracted. Also a method 
30 similar to that employed by the figure-representative-point extracting portion 1 03 may be employed. 

[01 1 2] In step S1 5, approximation of the position of a representative point train of the approximate figure in the suc- 
cessive frame with a curve is performed. Also a method similar to that employed by the representative point trajectory 
curve approximating portion 104 may be employed. 

[0113] In step S16, a branching process is performed. Thus, determination is made whether or not the processed 
35 image is in the final frame or whether or not the object in the processed frequency which is to be extracted has disap- 
peared from the image (or considered that the object has disappeared). In an affirmative case, the process is branched 
to step S17. In a negative case (both of the cases are negated), the process is branched to step S1 1. 
[0114] In step S17, the approximate curve calculated in step S15 is recorded in a recording medium as object 
region data in accordance with a predetermined format. 
40 [0115] The example has been described with which one figure is assigned to one object to roughly express the 
object region. The accuracy of approximation may be improved by making approximation to the region of one object 
with a plurality of figures. FIG. 1 1 shows an example in which a plurality of figures are approximated to one object. In 
the foregoing case, a region of a person in the image is expressed with 6 ellipses 600 to 605. 
[0116] When one object is expressed with the plural figures as shown in FIG. 1 1 , a process for dividing the object 
45 into a plurality of regions must be performed. The process may be performed by an arbitrary method. For example, a 
method with which the object is directly divided with manpower may be employed. In the foregoing case, a pointing 
device, such as a mouse, is used to, on the image, enclose the region with a rectangle or an ellipse. Alternatively, the 
region is specified with a trajectory of the pointing device. When an automatic method is employed as a substitute for 
the manpower, a method may be employed with which clustering of movement of the object is performed to realize the 
* so division. The foregoing method is a method with which the movement of each region in the object among the successive 
frames is determined by a correlation method (refer to, for example, Image Analysis Handbook Chapter-3, Section II, 
Publish Conference of Tokyo University, 1991) or a method with gradient constraints (refer to, for example, Determining 
optical flow, B. K. P. Horn and B. G. Schunck, Artificial Intelligence, Vol. 17, pp. 185-203, 1981) to collect similar move- 
ments to form a region, 

55 [01 17] Each of the divided regions is subjected to the process which is performed by the example of the structure 
shown in FIG. 1 or the procedure shown in FIG. 10 so that data of the approximate figure is produced. In the foregoing 
case, the spline function, which must be described in object region data of one object increases as the number of the 
approximate figures increases. Therefore, the structure of data is formed which includes approximate figure data 405 
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by the number (L in the foregoing case) corresponding to the approximate figure number 404, as shown in FIG. 12. 
[0118] As described above, the field for the approximate figure number 404 may be omitted by making the approx- 
imate figure number to always be one (therefore, data of the approximate figure is made to always be one) to the object 
region data. In the foregoing case, one object can be expressed with a plurality of figures when object region data is 
5 produced for each figure approximating one object (the same ID number is given). That is, approximate figure data (1 ) 
to approximate figure data (L) 405 shown in FIG. 12 is required to be substituted for partial data (1) to partial data (L) 
in a certain region (for example, a region 605). 

[0119] When one object is expressed with a plurality of figures in this embodiment, the same figure is employed. A 
mixture of a plurality types of figures may be employed. 

10 [0120] Variation of a method of use of region data produced and recorded in this embodiment will now be 
described. Although a person, an animal, a building or a plant is considered as the object in a video, the process 
according to this embodiment may be applied to any object in the video. For example, a telop may be handled as an 
object in a video. Therefore, a process in which a telop is employed as the variations of the object will now be described. 
[0121] The telop is character information added to the image. In U.S. character information called a "closed caption" 

is must be added. In broadcasts in Japan frequencies of use of telops have been increased. The telop which must be dis- 
played includes a moving telop, such as a still telop, a telop which is scrolled upwards in the screen and a telop which 
is scrolled from right to the left of the screen. When the region in which the telop is being displayed is approximated with 
a figure to store the telop character train as related information, the contents of the image can easily be recognized or 
a predetermined image can easily be retrieved. 

20 [0122] The region extracting portion 101 performs a process by employing a method with which a telop region is 
manually specified. Another method may be employed which has been disclosed in "Method of Extracting Character 
Portion from Video to Recognize Telop" (Hori, 99-CV1M-114, pp. 129-136, 1999, "Information Processing Society of 
Japan Technical Report") and with which the brightness and edge information of characters are employed to perform 
character train extracting method. Another method has been disclosed in "Improvement in Accuracy of Newspaper 

25 Story Based on Telop Character Recognition of News Video" (Katayama et a). Vol. 1 , pp. 1 05-1 10, proceedings of Meet- 
ing on Image (Recognition and Understanding (MIRU '98)) to separate background and the telop from each other by 
examining the intensity of edges. Thus, the telop region is extracted. Each character and each character train may be 
cut from the obtained telop region. Edge information in the telop region in successive frames is compared with each 
other to detect a frame in which the telop has appeared and a frame in which the same has disappeared. 

30 [0123] The region figure approximating portion 1 02 performs a process to approximate the telop region extracted 
by the region extracting portion 101 with a rectangle. The number of the frequency in which the telop has appeared is 
stored in the leading frame number of object region data (401 shown in FIG. 5 or FIG. 12). On the other hand, the frame 
in which the telop has disappeared is stored in the trailing frame number 402. A pointer for pointing the character train 
information of the relop is stored in the related information pointer 403 for pointing related information. As approximate 

35 figure data 405. rectangular region data encircling the telop is stored. When each row of a telop composed of a plurality 
of rows is made to be an individual region or when each character is made to be an individual region, the number of 
rows or characters is stored in the approximate figure number 404. Rectangular region data encircling each row or char- 
acter, that is, approximate figure data 405, is stored by the corresponding number. 

[01 24] The figure-representative-point extracting portion 1 03 and the representative point trajectory carve approxi- 
40 mating portion 104 perform processes as described above because any specialized portion for the telop is included in 
the processes. 

[0125] The character train information of the telop which has appeared is stored in the related information storage 
portion 105. Moreover, the pointer for pointing information above is stored in telop region data (object region data). 
[0126] When a keyword has been input and a character train corresponding or relating to the keyword is included 
45 in the character train information of the telop, the frame and time at which the character train appears can easily be 
detected. If the image is a news program, retrieval of interesting articles is permitted to look only the articles. 
[01 27] In the foregoing case, addition of a pointer for pointing object region data corresponding to the frame or time 
to the character train information of the telop facilitates the retrieval. 

[0128] Thus, the telop is processed as described above. Variations of the object may be applied to the method of 
so using this embodiment. 

[0129] Although the method of approximation using the ellipse has been described in the structure shown in FIG. 
2, an approximation method using a rectangle will now be described as another approximation method. 
[0130] FIGS. 13A, 13B and 13C are diagrams formed into the same shape as that of FIGS. 2A, 2B, 2C and 2D. In 
the foregoing case, the region figure approximating portion 102 employs a method of approximating a region with a rec- 
55 tangle. The figure-representative-potnt extracting portion 103 employs a method of extracting the four vertices of the 
rectangle. The representative point trajectory curve approximating portion 104 employs an approximation method using 
a spline function. 

[0131] Referring to FIG. 13 A, reference numeral 2800 represents video data for one frame which is to be proc- 
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essed 

[0132] Reference numeral 2801 represents an object region which is to be extracted. A process for extracting the 
region 2801 of the object is performed by the region extracting portion 101 . 

[0133] Reference numeral 2802 represents a result of approximation of the object region with the rectangle. The 
5 process for obtaining the rectangle 2802 from the object region 2801 is performed by the region figure approximating 
portion 102. 

[0134] An example of the process for obtaining the rectangle 2802 shown in FIG. 13A is shown in FIG. 14. That is, 
a mask image ol the frame 2800 is raster-scanned (step S60). When the subject pixel is included in the object region 
(step S61), the minimum value is updated if each of the X and Y coordinates is smaller than the stored minimum value. 
w If the values are larger than the maximum value, the maximum value is updated (step S62). The foregoing process is 
repeated and checked for all of the pixels so that the minimum and maximum values of the pixel position indicating the 
object region 2801 for each of the X and Y coordinates are obtained. Thus, the coordinates of the four vertices of the 
rectangle 2802 can be obtained. 

[0135] Although the above-mentioned method is excellent in easiness of the process, a multiplicity of non-object 
15 regions are undesirably contained in the approximate rectangle 3002 when, for example, as shown in FIG. 15, an elon- 
gated object 3001 exists diagonally with respect to a screen 3000. When the elongated object is rotated, the size and 
shape of the rectangle 2802 are changed. The foregoing facts sometimes obstruct identification and instruction of the 
object. 

[0136] An example of the approximation method will now be described with which the size of the rectangle can be 
20 minimized (the number of the non-object regions in the approximate rectangle can be minimized) and to which the atti- 
tude of the object can be reflected. 

[01 37] Referring to FIG. 1 6A, reference numeral 3 1 00 represents a video for one frame which is to be processed. 
[0138] Reference numeral 3101 represents an object region which is to be extracted. A process for extracting the 
object region 3101 is performed by the region extracting portion 101 . 
25 [01 39] Reference numeral 3 1 02 represents a result of approximation of the object region. As distinct from the rec- 
tangle 2802 shown in FIG. 13A, the foregoing approximate rectangle 3102 is inclined. Also only a small number of the 
non-object regions exists in the region 3102. When the subject has been rotated, the shape of the region 3102 is not 
changed. The process for obtaining the rectangle 3102 from the object region 3101 is performed by the region figure 
approximating portion 102. 

30 [0140] FIG. 1 7 shows an example of the process. The process is arranged such that a principal axis of inertia of the 

object region is obtained. Moreover, an approximate figure is obtained in accordance with the principal axis of inertia. 

[0141] Referring to FIG. 16B. reference numeral 3103 represents the centroid of the object region 3101. 

[0142] Reference numeral 3104 represents the principal axis of inertia of the object region 3101. Reference 

numeral 3105 represents a straight line perpendicular to the centroid 3104. 
35 [0143] Initially, inertia moments 1*20, m 02 and m 1 ^ of the object region are obtained (steps S70 to S72). 

[0144] Assuming that the mask image is f(x, y), f(x, y) is 1 in the region 3101 and 0 on the outside of the region 

3101 . The inertia moment of the subject 3101 can be expressed as follows: 

m (J = IXx i y J f(x,y) 

40 

[0145] The inertia moment of f(x, y) with respect to a straight line y - x tane passing through the origin is obtained 
as follows: 

m 0 a \\ (x sine -y cose) 2 f(x, y)dxdy 

45 

[0146] An assumption is made that the angle with which me is minimized when 8 has been changed is Go- When 
only one set of angles exists, the straight line y = x tane 0 is called the principal axis of inertia. 
[0147] Note that tane 0 can be obtained as a solution of the following quadratic equation: 

so tan 2 e +{(m 20 -m 02 )/m 11 }tane -1=0 

[0148] When tane 0 is obtained around the. centroid 3103, the related information of the object can be obtained 
(step S73). 

[0149] Then, a straight line in parallel with the principal axis of inertia and circumscribing the object region and a 
55 straight line perpendicular to the principal axis of inertia and circumscribing the object region are obtained (step S74). 
[0150] Referring to FIG. 16B, straight lines 3106 and 3107 are in parallel with the principal axis of inertia 3104. The 
straight lines 3106 and 3107 circumscribes the object region. 

[0151] Straight lines 3108 and 3109 are straight lines in parallel with the straight line 3105, the straight lines 3108 
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and 3109 circumscribing the object region. 

[0152] The rectangle 3102 is formed by the straight lines 3106, 3107, 3108 and 3109 (step S75). 

[01 53] When the object is formed into a circle, the principal axis of inertia cannot be obtained. In the foregoing case, 

a procedure, for example, as shown in FIG. 14, may be employed to obtain an approximate rectangle. 

5 [0154] The object region can sometimes more satisfactorily be expressed by an ellipse as compared with expres- 
sion by the rectangle. FIG. 18 shows an example of a method of an approximate ellipse from a rectangle when the 
object region is expressed with the rectangle. FIG. 19 shows an example of a process employed in the foregoing case. 
[0155] Referring to FIG. 18, an assumption is made that an object region 3300 and a circumscribing rectangle 3301 
have been obtained. \ ' 

10 [0156] Initially, the inscribing ellipse and the circumscribing ellipse of the approximate rectangle 3301 are obtained 
(step S80). 

[0157] Referring to FIG. 18, an ellipse 3302 is an inscribing ellipse of the rectangle 3301 and the ellipse 3303 is an 
circumscribing ellipse of the rectangle 3301 . 

[0158] Then, the size of the inscribing ellipse 3302 is gradually brought closer to that of the circumscribing ellipse 
is 3303 (step S81). Then, an ellipse 3304 for completely including the object region 3300 is obtained (step S82) to employ 
the ellipse 3304 as the approximate ellipse. The unit for enlarging the size of the inscribing ellipse 3302 in each process 
of the repeated process may previously be determined. The unit may be determined in accordance with the difference 
between the size of the inscribing ellipse 3302 and that of the circumscribing ellipse 3303. 

[0159] A reverse method may be employed with which the size of the circumscribing ellipse 3303 is brought closer 
20 to the size of the inscribing ellipse 3302. In the foregoing case, the circumscribing ellipse 3303 includes the object 
region 3300 from the first. Therefore, the ellipse previous to the ellipse with which the portion which is not included in 
the object region 3300 has first occurred in the repeated process is required to be the approximate ellipse 3304. 
[0160] Then, the figure-representative-point extracting portion 103 obtains the representative points of the approx- 
imate rectangle or the approximate ellipse. The representative points of a rectangle may be the four or three vertices of 
25 the rectangle. The representative points of the ellipse may be the vertices of the circumscribing rectangle or two focal 
points and one point on the ellipse. 

[01 61 ] Then, the representative point trajectory curve approximating portion 1 04 approximates the trajectory of the 
representative points obtained in the time sequential manner with a spline function or the like. At this time, it is important 
to bring the time sequences into correspondence with each other. When the approximate figure is in the form of a rec- 
30 tangle and having the representative points which are the vertices, the vertices of the adjacent frames must be brought 
into correspondence with each other. 

[01 62] FIG. 20 shows an example of a method of a correspondence process. FIG. 21 shows an example of the pro- 
cedure of the correspondence process. 

[0163] Referring to FIG. 20, reference numeral 3500 represents the centroid of the approximate rectangle. A rec- 
35 tangle 3501 in the previous frame and a rectangle 3502 in the present frame have been obtained. Either of the rectangle 
3501 or 3502 is moved in parallel to make the centroids to coincide with each other (a state in which the centroids have 
been made coincide with each other is shown in FIG. 20). Distances d1 to d4 between the vertices of the two rectangles 
are calculated to obtain the sum of the distances in the combinations of all of the vertices (steps S90 and S91). A com- 
bination with which the sum of the distances made to be shortest is detected to establish the correspondence (step 
40 S92). 

[0164] When representative points are obtained from the approximate figure, the number of combinations which is 
obtained in step S91 can be reduced when the representative points are obtained by a predetermined rule. When the 
coordinates of the vertices of a rectangle are stored clockwise, only four combinations is required for the correspond- 
ence. 

45 [0165] The foregoing method sometimes has difficulty in realizing the corresponding state. When the approximate 
rectangle is formed into a square-like shape between the adjacent frames and the approximate rectangle has been 
rotated by 45°, the corresponding state cannot easily be realized (because the sums of the distances are made to be 
similar values between the two combinations). In the foregoing case, a method may be employed with which the exclu- 
sive OR is obtained between the regions of the object in the approximate rectangle to employ a combination with which 

so the area is minimized. Another method may be employed with which an absolute difference between textures of the 
object region is obtained to detect a combination with which the difference is minimized. 

[0166] An example will now be described in which when a trajectory of the object region is described by the method 
according to the present invention, the structure of data which is different from the approximate data structure shown in 
FIGS. 6 and 7 is employed. 

55 [01 67] FIG. 22 shows another example of a description format for data of the approximate figure and data of trajec- 
tories of representative points of the object region. Note that FIG. 22 shows only one representative point for a portion 
(portion from knot number (N) 3902 to a function specifying information arrangement 3913) of data of the trajectory of 
the representative point (in actual, a plurality of representative points are described to correspond to the number of the 
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representative points). 

[01 68] Description will now be made on the assumption that the highest order of the polynomial is the second order. 
[0169] In the foregoing example (shown in FIGS. 5, 6 and 7), all of the coefficients of the polynomial spline function 
are described. The description method in this example is arranged to use combination of the coordinate of the knot of 
5 the spline function and a value relating to the second-order coefficient of the spline function. The foregoing description 
method has an advantage that the knot can easily be extracted to cause the trajectory of a large object to easily be 
detected. 

[0170] The foregoing description method will now be described. 

[0171] The figure type ID 3900 shown in FIG. 22 specifies the type of the figure which has been used to make the 
w approximation of the shape of an object. For example, only the centroid, the rectangle, the ellipse or their combination 
can be specified. FIG. 23 shows an example of types of the figures and assignment of the figure type ID. A represent- 
ative point number 3901 indicates the number of the trajectories of the representative points which are determined in 
accordance with the type of the figure. 

[0172] The knot number (N) 3902 indicates the number of knots of a spline function expressing the trajectory of the 
is representative point. The frame corresponding to each knot is expressed as time so as to be stored in knot time (1) to 
knot time (N) 3903. Since a predetermined number of knot time has been provided, the knot time is described as knot 
time arrangement 3904. 

[0173] Also x and y coordinates of each knot are described as arrangements 3906 and 3908 of X coordinate 3905 
of the knot and the Y coordinate 3907 of the knot. 

20 [01 74] A linear function flag 3909 indicates whether or not only a linear function is employed as the spline function 
between knots, If second or higher order polynomial is partially employed, the foregoing flag 3909 is turned off. Since 
the foregoing flag 3909 is employed, description of function specifying information 3910 to be described later which is 
employed when only the linear function is employed as the approximate function can be omitted. Therefore, an advan- 
tage can be realized in that the quantity of data can be reduced. Note that the flag may be omitted. 

25 [01 75] A function ID 391 1 and a function parameter 391 2 contained in function specifying information 391 0 indicate 
the order of the polynomial spline function and information for specifying the coefficient of the polynomial spline func- 
tion, respectively. FIG. 24 shows their examples. Note that ta and tb are time of continuous knots, f (t) is a spline function 
in a region [ta. tb] and, fa and fb are coordinates of the knot at time ta and tb. Since information about the knot is suffi- 
cient information when a liner polynomial is employed, no function parameter is described. When a quadratic polyno- 

30 mial is employed, one value is described in the function parameter as information for identifying the coefficient. 
Although the quadratic coefficient is employed in the example shown in FIG. 24, another value, for example, one point 
on the quadratic curve except for fa and fb may be employed. 

[0176] The foregoing description method is able to regenerate the spline function in all regions in accordance with 
information about the knots and the function parameter under the limitation conditions shown in FIG. 24. 
35 [01 77] Function specifying information 391 0 exists by the number corresponding to knot number N - 1 , the function 
specifying information 391 0 being described as an arrangement 391 3. 

[01 78] Although the description has been made that the highest order of the polynomial is the quadratic order, the 
highest order of the polynomial may, of course, be a cubic or higher order. 
[01 79] The variations of related information will now be described. 
40 [0180] FIG. 25 shows an example of the structure of data 4200 about related information for use in a monitor video. 
Actual data is required to contain at least one item. 

[0181] An object type 4201 is data indicating the type, 6uch as a "vehicle" or a "person", of an object to which 
approximation is made. 

[0182] Identification information 4202 is data for identifying an actual object, such as "name of a person", "the 

45 license number of a vehicle" or "the type of the vehicle". 

[0183] An operation content 4203 is data indicating the operation, such as "walking" or "running" of the object. 
[0184] FIG. 26 shows an example of the structure of data 4300 about related information for mainly use in a com- 
mercial contents or hyper media contents. Actual data is required to contain at least one item. 
[0185] Name 4301 is data indicating name of the object. In a case where the object is a character of a movie or the 

so. like, name of the player or the actor is specified. 

[0186] Copyright information 4302 is data indicating information relating to the copyright of a copyright holder of the 
object. 

[0187] A copy permission information 4303 is data indicating whether or not video information in a range contained 
in the figure approximating the object is permitted to be cut and re-used. 
55 [0188] A foot mark 4304 is data indicating the time at which the object has finally been edited. 

[0189] URL 4305 of related information formed by expressing data to which a reference must be made when 
related information of the object is displayed by using URL 

[01 90] Access limit information 4306 is data about information permission/inhibition of audience and jump owing to 
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a hyper link for each object and data for setting permission condition. 
[01 91 ] Billing information 4307 is data indicating billing information for each object. 
[01 92] Annotation data 4308 is data for aiding related information of the object and the operation. 
[0193] Since a relatively small number of related information items shown in FIGS. 25 and 26 exists, it is preferable 
5 that related information is described in object region data. 

[01 94] A method of providing video data and object region data will now be described. 

[01 95] When object region data produced owing to the process according to this embodiment is provided for a user, 
a creator must provide object region data for the user by a method of some kind. The object region data may be pro- 
vided by any one of the following methods. 

10 

(1 j A method with which video data, its object region data and its related information are recorded in one (or a plu- 
rality of) recording medium so as to simultaneously be provided. 

(2) A method with which video data and object region data are recorded in one (or a plurality of) recording medium . 
so as to simultaneously be provided. However, related information is individually provided or provision of the same 

15 is not performed (the latter case is a case in which related information can individually be acquired through a net- 
work or the like if provision is not performed). 

(3) A method with which video data is solely provided. Moreover, object region data and related information are 
recorded in one (or a plurality of) recording medium so as to simultaneously be provided. 

(4) A method with which video data, object region data and related information are individually provided. 

20 

[0196] The recording medium is mainly used to perform provision in the foregoing case. Another method may be 
employed with wNch a portion or the overall portion of data is provided from a communication medium. 
[0197] As described above, the structure according to this embodiment is able to describe the object region in a 
video as a parameter of a curve approximating the time-sequential trajectory (the trajectory of the coordinates of the 
25 representative points having the frame numbers or time stamps as the variables) of the coordinates of the representa- 
tive points of the approximate figure of the object region. 

[0198] Since this embodiment enables the object region in a video to be expressed with only the parameters of the 
function, object region data, the quantity of which can be reduced and which can easily be handled, can be generated. 
Moreover, expression of the representative points and generation of the parameters of the approximate curve can easily 
30 be performed. 

[0199] According to this embodiment, whether or not arbitrary coordinates specified by a user indicate the object 
region can considerably easily be determined. Moreover, it leads to a fact that specification of a moving object in a video 
can furthermore easily be performed. 

[0200] Other embodiments of the object-region-data generating apparatus according to the present invention will 
35 be described. The same portions as those of the first embodiment will be indicated in the same reference numerals and 
their detailed description will be omitted. 

Second Embodiment 

40 [0201] The first embodiment has the structure that the representative points of a figure approximating the object 
region in a video is extracted so as to be converted into object region data. On the other hand, a second embodiment 
has a structure that characteristic points in the abject region in the video are extracted so as to be converted into object 
region data. 

[0202] Description will be made about the different structures from those according to the first embodiment. 

45 [0203] FIG. 27 shows an example of the structure of an object-region-data generating apparatus according to this 
embodiment. As shown in FIG. 27, the object-region-data generating apparatus according to this embodiment incorpo- 
rates a video data storage portion 230, a characteristic-point extracting portion 233, a characteristic-point-curve 
approximating portion 234 for approximating the arrangement of characteristic points with a curve, a related information 
storage portion 235 and a region data storage portion 236. 

so [0204] Referring to FIG. 27, a video data storage portion 230 has the same function as that of the video data stor- 
age portion 100 according to the first embodiment. The related information storage portion 235 has the same function 
as that of the related information storage portion 105 according to the first embodiment. The region data storage portion 
236 has the same function as that of the region data storage portion 106 according to the first embodiment. 
[0205] The characteristic-point extracting portion 233 extracts at least one characteristic point from the object 

55 region in the video. The characteristic point may be any one a variety of points. For example, corners of an object (for 
example, a method disclosed in "Gray-level corner detection, L Kitchen and A. Ftosenfeid, Pattern Recognition Letters, 
No. 1 , pp. 95-1 02, 1 982) or the centroid of the object may be employed. When the centroid of the object is employed as 
the characteristic point, it is preferable that the portion around the point assumed as the centroid is specified and then 
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automatic extraction is performed. 

[0206] The characteristic-point-curve approximating portion 234 has a basic function similar to that of the repre- 
sentative point trajectory curve approximating portion 104 according to the first embodiment. That is, the characteristic- 
point-curve approximating portion 234 time-sequentially approximates, to a curve, the positions of the characteristic 

5 points extracted by the characteristic-point extracting portion 233. The approximate curve is, for each of the X and Y 
coordinates, expressed as the function of the frame number f or the time stamp t given to the video so as to be approx- 
imated with a curve by linear approximation or approximation using a spline curve. Data after the approximation has 
been performed is recorded by a method similar to that according to the first embodiment. 
[0207] Note that object region data according to this embodiment is basically similar to object region data according 

10 to the first embodiment (see FIG. 5). The field for the approximate figure number is not required. Note that "data of the 
approximate figure" is "data of characteristic points". 

[0208] Also data of the characteristic point in object region data is basically similar to data of the approximate figure 
according to the first embodiment (see FIG. 6). Note that the "number of representative points' 1 is the "number of char- 
acteristic points". The "data of the trajectory of representative points" is the "data of the trajectory of characteristic 

is points". Note that figure type ID is not required. 

[0209] Data of the trajectory of the characteristic points included in the data of the characteristic points is similar to 
data of the trajectory of the representative points according to the first embodiment (see FIG. 7). 
[0210] FIG. 28 is a flowchart showing an example of a flow of the process of the object-region-data generating 
apparatus according to this embodiment. The overall flow is similar to that according to the first embodiment. In step 

20 S21 , video data for one frame is extracted from the video data storage portion 230 similarly to step S1 1 shown in FIG. 
10. Steps S12 to S14 shown in FIG. 10 are made to be step S22 for extracting the characteristic points of the object of 
interest. The approximation of the position of the representative point train of the approximate figure in the successive 
frames with a curve in step S15 shown in FIG. 10 is made to be step S23 for making approximation of the position of 
the characteristic point train of the object region in the successive frames with a curve. Moreover, steps S24 and S25 

25 are similar to steps S16 and S17 shown in FIG. 10. 

[021 1 ] As a matter of course, the process according to this embodiment can be realized by software. 
[0212] As described above, the structure according to this embodiment is able to describe the object region in a 
video as a parameter of a curve approximating the time-sequential trajectory (the trajectory of the coordinates of the 
characteristic points having the frame numbers or time stamps as the variables) of the characteristic points of the 

30 region. 

[021 3] Since this embodiment enables the object region in a video to be expressed with only the parameters of the 
function, object region data, the quantity of which can be reduced and which can easily be handled, can be generated. 
Moreover, expression of the characteristic points and generation of the parameters of the approximate curve can easily 
be performed. 

35 [0214] According to this embodiment, whether or not arbitrary coordinates specified by a user indicate the object 
region can considerably easily be determined. Moreover, it leads to a fact that specification of a moving object in a video 
can furthermore easily be performed. 

[021 5] Note that object region data based on the representative points of the approximate figure of the object region 
according to the first embodiment and object region data based on the characteristic points of the object region accord- 

40 ing to the second embodiment may be mixed with each other. 

[0216] In the foregoing case, the format of object region data according to the first embodiment is provided with a 
flag for identifying a tact that object region data is based on the representative points of the approximate figure of the 
object region or the characteristic points of the object region. As an alternative to providing the flag for the format of 
object region data according to the first embodiment, when the figure type ID has a specific value, a fact that object 

45 region data is based on the characteristic points of the object region is indicated. In the other cases, a fact is indicated 
that object region data is based on the representative points of the approximate figure of the object region. 
[0217] The structure of object region data and a creating side have been described. The portion for using the 
above-mentioned object region data will now be described. 

50 Third Embodiment 

[0218] In the third embodiment, when object region data including related information has been given to the object 
in the video, a user specifies an object (mainly on a GUI screen) to display related information (display of characters, a 
still image or a moving image, or output of sound) or causes a related program to be executed. 
55 [0219] FIG. 29 shows an example of the structure of a video processing apparatus according to this embodiment. 
As shown in FIG. 29, the video processing apparatus according to this embodiment incorporates a video data display 
portion 301 , a control unit 302, a related information display portion 303 and an instruction input portion 304. 
[0220] The video data display portion 301 displays video data input from a recording medium or the like (not shown) 
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on a liquid crystal display unit or a CRT. 

[0221 ] The instruction input portion 304 permits a user to use a pointing device, such as a mouse, or a keyboard to 
perform an operation, for example, specification of an object in the video displayed on the liquid crystal unit or the CRT 
Moreover, the instruction input portion 304 receives input (specification of an object) from the user. 

5 [0222] The control unit 302, to be described later, determines whether or not the user has specified the object in 
the video in accordance with, for example, the coordinates specified by the user on the screen and object region data 
input from a recording medium (not shown). Moreover, the control unit 302 makes a reference to the pointer for pointing 
related information of object region data when a determination has been made that the user has specified a certain 
object in the video. Thus, the control unit 302 acquires related information of the object to display the related infbrma- 

10 tion. 

[0223] The related information display portion 303 responds to the instruction issued from the control unit 302 to 
acquire and display related information (from a recording medium or a server or the like through a network). 
[0224] When the pointer for pointing related information is a pointer for pointing data in which program or the oper- 
ation of the computer is described, the computer performs a predetermined operation. 
15 [0225] As a matter of course, also this embodiment may be realized by software. 

[0226] A process which is performed when the object region is expressed as an approximate figure similarly to the 
first embodiment will now be described. 

[0227] FIG. 30 shows an example of the process according to this example. The flowchart shown in FIG. 30 
includes only a process which is performed when a certain region in a video which is being displayed during repVoduc- 
20 tion of the video is specified by using a pointing device, such as a mouse cursor (basically corresponding to the process 
which is performed by the control unit 302). 

[0228] In step S31 , the coordinates on the screen specified by using the pointing device or the like are calculated. 
Moreover, the frame number of the video which is being reproduced at the moment of the instruction is acquired. Note 
that a time stamp may be employed as a substitute for the frame number (hereinafter the frame number is employed). 

25 [0229] In step S32, the object existing in the video having the frame number in which the object has been specified 
is selected from object region data of the object added to the video. The foregoing selection can easily be performed by 
making a reference to the leading frame number and the trailing frame number of object region data. 
[0230] In step S33, data of a splint function (see FIGS. 6 and 7) extracted from object region data of the region 
selected in step S32 is used to calculate the coordinates of the representative points of the approximate figure in the 

30 video display frame number when the object has been specified. Thus, two focal points F and G and point H on the 
ellipse are obtained in the example according to the first embodiment (see FIGS. 2 and 4). 

[0231 ] In step S34, it is determined whether or not the coordinates specified by using the pointing device or the like 
exist in the object (that is, the approximate figure) in accordance with the discrimination procedure which is decided 
according to the obtained representative points and the figure type ID of object region data. 
35 [0232] When the ellipse is represented by the two focal points and one point on the ellipse similarly to the first 
embodiment, the determination can easily be made. 

[0233] When, for example, the Euclidean distance between points P and point Q is expressed by E (P, Q) similarly 
to the first embodiment, the following inequality is held in a case where the coordinate P specified by using the pointing 
device exists in the ellipse: 

40 

E (F, P) + E (G, P)sE (F, H) + E (G. H) 

[0234] In a case where the coordinate P exists on the outside of the ellipse, the following inequality is held: 

45 E (F. P) + E (G, P) > E (F, H) + E (G, H) 

[0235] The foregoing inequalities are used to determine whether or not the specified point exists in the object. 
Then, it is determined whether step S35 rs performed or omitted (skipped) in accordance with a result of the determi- 
nation. 

so [0236] When a parallelogram is employed as the approximate figure of the object region in the video, four inequal- 
ities are used as a substitution for one inequality in the case of the ellipse to determine whether or not the arbitrary coor- 
dinates exist in the object. 

[0237] When, for example, points A, B and C shown in FIG. 8 are representative points, point D is obtained as fol- 
lows: 

55 

D=C+A-B 

[0238] Then, an assumption is made that a point on a straight line passing through the points A and B is Q and the 
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w 



straight line is expressed by the following equation: 

f AtB (Q) = 0 

[0239] When the point P exists in the figure, the following two inequalities are simultaneously held: 

W p )* W p )<°< and 

fB,c(P>*W p )<° 

where 



f a,b( r ) ■ (ya - y b) * ( x - x a) - ( x a ■ x b) x (y " Ya) 
fB.c( p )-(yB.-yc) * ( x - x b) -( x b- x c) * (y-yb) 

15 f c,o( p ) = (Yd " y c) * ( x ■ x c) ^ x d - x c) x (Y " Yc) 
f d.a( p ) = (Ya - y d) x ( x - x o) - ( x a • x d) * (Y ■ Yd)' 
and 

P =» (x, y), A = (x A , y A ), B = (x B , y B ), C = (x c , y c ) ( D = (x D , y D ) 

20 [0240] When approximation to one object with a plurality of approximate figures is made (refer to the approximate 
figure number shown in FIGS. 5 and 12), the foregoing process is performed for each approximate figure. 
[0241] In step S35, a process which is performed only when the specified point exists in the object region. In the 
foregoing case, a reference to the related information pointer 403 for pointing related information contained in object 
region data (see FIG. 5) is made. In accordance with information about the pointer, related information is acquired so 

25 as to be, for example, displayed (in the example of the structure shown in FIG. 29, the foregoing process is performed 
by the related information display portion 303). When a program has been specified as related information, an specified 
program is executed or another specified operation is performed. When related information has been described in 
object region data, related information is required to be displayed. 

[0242] FIG. 31 shows an example of a case where description (a text) of an object in a video has been given as the 
30 related information. When the coordinates specified by using the pointing device 802 during reproduction of a video 800 
exist in the object region 801 (a figure approximating the object 801), related information 803 is displayed on an individ- 
ual window. 

[0243] In step S36, a branching process is performed so that it is determined whether or not an object having object 
region data furthermore exists in the frame in which the object has been specified. If the object exists, the process pro- 
35 ceeds to step S32. If the object does not exist, the operation is completed. 

[0244] When a plurality of regions overlap, either or both of the regions may arbitrarily be selected. 
[0245] A process which is performed when the object region is expressed as characteristic points of the object sim- 
ilarly to the second embodiment will now be described. 

[0246] The portions different from those according to the first embodiment will mainly be described. 

40 [0247] FIG. 32 shows an example of the procedure according to this example. Note that the flowchart shown in FIG. 
32 includes only a process (basically, corresponding to the process which is performed by the control unit 302) which 
is performed when a certain region in a video which is being displayed during reproduction of the video has been spec- 
ified by using a pointing device, such as a mouse cursor. Since the overall flow is similar to that of the flowchart shown 
in FIG. 30, different portions will mainly be described (steps S41 , S42, S45 and S46 are similar to steps S31 , S32, S35 

45 and S36). 

[0248] In step S43, the coordinates of the position of the characteristic point of an object in a displayed frame 
number are calculated from object region data. When a plurality of characteristic points exist, the coordinates of all of 
the characteristic points are calculated. 

[0249] In step S44, the distance between the position of the characteristic point calculated in step S43 and the coor- 
so dinates specified by clicking is calculated for all of the characteristic points. Then, it is determined whether or not one 
or more characteristic point positioned distant for a distance shorter than a predetermined threshold value. Alterna- 
tively, a process for calculating the distance for a certain characteristic point and comparing the distance with a prede- 
termined threshold value is repeated. When one characteristic point positioned distant for a distance shorter than the 
threshold value is detected, the process is interrupted. If one or more characteristic points distant for a distance shorter 
55 than the threshold value exits, the process proceeds to step S45. If no characteristic point of the foregoing type does 
not exist, the process proceeds to step S46. 

[0250] As a result of the foregoing process, display of related information can be performed in accordance with the 
coordinates of the characteristic point of the object when a portion adjacent to the region of the interest has been spec- 



20 



3/29/2007, EAST Version: 2,1.0,14 



EP 1 024 667 A2 

Hied by an operation using a pointing device or the like. 



Fourth Embodiment 

5 [0251 ] A fourth embodiment wilt now be described with which an object region having related information which can 
be displayed is clearly displayed (communicated to a user) by using object region data. In the foregoing case, the object 
having related information which can be displayed must previously be supplied with object region data including a 
pointer for pointing the related information. 

[0252] The block structure of this embodiment is similar to that according to, for example, the third embodiment (see 
w FIG. 29). ♦ 

[0253] As a matter of course, also this embodiment can be realized by software. 

[0254] A case in which the object region has been expressed as an approximate figure similar to the first embodi- 
ment will now be described. 

[0255] FIG. 33 shows an example of a process according to this embodiment. 
is [0256] An example case in which the approximate figure is an ellipse will now be described. As a matter of course, 
a similar process is performed in a case of another approximate figure. 

[0257] In step S51 , the frame number of a video which is being displayed is acquired. Note that a time stamp may 
be employed as a substitute for the frame number (hereinafter the frame number is employed). 
[0258] In step S52, an object having the frame number acquired in step S51 and existing in the video is selected. 
20 The selection is performed by detecting data having a displayed frame number between the leading frame number of 
object region data given to the video and the trailing frame number. 

[0259] In step S53, data of a spline function (see FIGS. 6 and 7) is extracted from object region data of the object 
selected in step S52. Then, the coordinates of representative points of an approximate figure (or a region having related 
information) in the displayed frame are calculated. 
25 [0260] In step S54, a reference to the figure type ID of object region data is made to obtain an approximate figure 
expressed by the representative points calculated in step S53. Then, display of the image in each approximate figure 
(for example, an ellipse region) is changed. 

[0261] The change may be performed by a variety of methods. When the approximate figure is, for example, an 
ellipse, the brightness. of the image in the ellipse region is intensified by a predetermined value. Assuming that the 
30 degree of intensification is AY, the brightness before the change of the display is Y and an upper limit of the brightness 
which can be displayed is Ymax, each pixel in the ellipse is displayed with brightness of MIN(Y + AY, Ymax). Pixels on 
the outside of the ellipse are displayed with brightness of Y. Note that MIN(a, b) is a function taking a smaller value of a 
andfe 

[0262] FIGS. 34A and 34B show an example with which the object region is displayed by the method with which the 
35 brightness is intensified (in FIGS. 34A and 34B, hatching indicates no change in the brightness and no hatching indi- 
cates intensified brightness). FIG. 34A shows a screen 1000 which is in a state in which the display change process in 
step S54 has not been performed. Reference numeral 1001 represents an object having object region data in the video. 
A screen 1002 shown in FIG. 34B is displayed after the change in the display in step S54 has been performed. Refer- 
ence numeral 1003 represents an ellipse region approximating the object region 1001 . Display of only the inside portion 
40 of the ellipse region 1003 is brightened. Thus, a fact that the object is an object which permits display or the like of 
related information can be recognized. 

[0263] When one object is approximated with a plurality of approximate figures (refer to approximate figure number 
shown in FIGS. 5 and 12), the foregoing process is performed for each approximate figure. 

[0264] In step S55, it is determined whether or not another object, the display of which must be changed, exists. A 
45 determination is made whether or not a non-processed object having a display frame number which is between the 
leading frame number and the trailing frame number exists. If the non-processed object exists, the process from step 
S52 is repeated. If no object of the foregoing type exists, the process is completed. 

[0265] As described above, display of an object region having the related information among the regions of the 
object in the video which is specified by using object region data is changed. Thus, whether or not the related informa- 

so tion exists can quickly be detected. 

[0266] A method of indicating the object region which permits display or the like of related information may be the 
above-mentioned method with which the brightness in the object region is changed. Any one of a variety of methods 
may be employed. A variety of the methods will now be described. The procedure of each process using object region 
data is basically similar to the flowchart shown in FIG. 33. Therefore, step S54 is changed to a corresponding process. 

55 [0267] A display method shown in FIG. 35 is a method of displaying the position of an object having related infor- 
mation on the outside of an image 1600. Reference numerals 1601 and 1602 represent objects having related informa- 
tion. Reference numerals 1603 and 1604 represent bars for displaying the position of the object in the direction of the 
axis of ordinate and in the direction of the axis of abscissa. Display 1605 and display 1606 correspond to the object 
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1601 having related information. FIG. 35 shows a structure that bars serving as marks are displayed in the regions in 
which the region 1601 are projected in the direction of the axis of ordinate and in the direction of the axis of abscissa. 
Similarly, reference numerals 1607 and 1606 represent bars for displaying the object region 1602. 
[0268] A state of projection of the object region in the foregoing directions can easily be obtained by using the coor- 
5 dinates of the representative points of the approximate figure in the frame obtained from data of the approximate figure 
of object region data and the figure type ID as described in the embodiments. 

[0269] It is preferable that the region of a different object is indicated with a bar displayed in a different manner (for 
example, a different color). 

[0270] The method according to this embodiment causes a user to specify the inside portion of the image in accord- 
w ance with the bars 1603 and 1604 displayed in the vertical and horizontal directions on the outside of the image 1600 
by using a pointing device. Thus, related information can be displayed. 

[0271] It is preferable that the region of a different object is indicated with a bar displayed in a different manner (for 
example, a different color) . 

[0272] FIG. 36 shows another display method with which the position of an object having related information is dis- 
is played on the outside of an image 1700. Objects 1701 and 1702 each having related information exist in the image 

1700. The position of the object having related information is indicated by an object-position indicating bars 1703 and 

1 704. As distinct from the example shown in FIG. 35, each display bar indicates only the position of the centroid of the 

object as a substitute for the object region. Circles 1 705 and 1 706 indicate the centroid of the object 1 701 . Circles 1 707 

and 1 708 indicate the centroid of the object 1 702. 
20 [0273] Also the centroid of the object region in the foregoing directions can easily be obtained in accordance with 

the coordinates of the representative point of the approximate figure in the frame obtained from data of the approximate 

figure of object region data and the figure type ID. 

[0274] The foregoing method enables display which can easily be recognized because the size of display on the 
object position indicating bar can be reduced if the object has a large size or many objects exit. 

25 [0275] FIG. 37 shows an example of a display method with which a related information list is displayed on the out- 
side of an image 1800. The image 1800 contains objects 1801 and 1802 each having related information. Reference 
numeral 1803 represents a list of objects each having related information. The list 1803 shows information of objects 
each having related information in the image frame which is being displayed. In the example shown in FIG. 37, names 
of objects are displayed which are obtained as a result of retrieving related information from object region data of the 

30 objects existing in the frame. 

[0276] The foregoing method permits a user to cause related information to be displayed by specifying the name 
shown in the related information list 1803 as well as the specifying the region 1801 or 1802 with the pointing device. 
Since also instruction of the number shown in the list 1803 enables related information to be displayed, the foregoing 
structure can be employed in a case of a remote control having no pointing device. 

35 [0277] FIG. 38 shows a display method with which objects 1901 and 1902 existing in an image 1900 and each hav- 
ing related information are indicated with icons 1903 and 1904 to indicate existence of related information. The icon 
1903 corresponds to the object 1901. while the icon 1904 corresponds to the object 1902. 

[0278] Each icon can be displayed by obtaining an approximate figure as described above, by cutting a rectangle 
region having a predetermined size including the obtained approximate figure from video data in the frame and by, for 
40 example, arbitrarily contracting the cut rectangle region. 

[0279] The foregoing method enables related information to be displayed by directly specifying the icon as well as 
specifying the object region in the video. 

[0280] FIG. 39 shows an example of a display method configured to display a map indicating the object region hav- 
ing related information so as to indicate existence of related information. An image 2000 includes objects 2001 and 
45 2002 each having related information. Reference numeral 2003 represents a map of the regions of the objects each 
having related information. The map 2003 indicates the positions of the regions of the objects each having related infor- 
mation in the image 2000. Reference numeral 2004 represents the object 2001, while reference numeral 2005 repre- 
sents the object 2002. 

[0281] The map 2003 has a form obtained by reducing the image 2000 and arranged to display only the images of 
so the object regions (only the approximate figures obtained as described above are displayed at the corresponding posi- 
tions in the contracted image). 

[0282] The foregoing method enables related information to be displayed by specifying the object region 2004 or 
2005 displayed on the map 2003 as well as direct specification of an object in the image 2000. 
[0283] FIGS. 40A and 40B show an example of the display method with which specification of an object existing in 
55 the image and having related information is facilitated by using a pointing device by controlling reproduction rate of the 
image at the position of the mouse cursor. Reference numerals 2100 and 2102 represent the overall bodies of the dis- 
play screens and reference numerals 2101 and 2103 represent regions on the display screens on which images are 
being displayed. In the display screen 2100 shown in FIG. 40A, a mouse cursor 2104 is positioned on the outside of the 
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Image 21 01 so that the image is reproduced at a normal display rate (frame/second)(or reproducing speed). In the dis- 
play screen 2102 shown in FIG. 40B, the mouse cursor 2105 exists in the image region 2103. Therefore, display rate of 
the image is lowered or displayed image is frozen. 

[0284] Another structure may be employed as a substitute for the above-mentioned structure in which image dis- 
s play rate is always lowered or the displayed image is frozen when the mouse cursor has entered the image region. That 
is, whether or not an object having related information exists in the frame is determined (determination is made by com- 
paring the frame number and the leading frame number/trailing frame number with each other). If the object having 
related information exists in the frame, the image display rate is lowered or the displayed image is frozen. 
[0285] For example, an object which is moving at high speed in the video cannot sometimes easily be specified by 
w using the mouse cursor. The foregoing method is arranged to change the reproducing speed according to the position 
of the mouse cursor. Thus, movement of the object can be slowed when the object is specified or the displayed image 
can be frozen. Hence it follows that instruction can easily be performed. 

[0286] FIG. 41 shows an example of the display method with which an object existing in the image and having 
related information can easily be specified by using the pointing device. Reference numeral 2500 represents an image 

is which is being reproduced. Reference numeral 2501 represents a button for acquiring an image. When the button 2501 
is depressed with a mouse pointer 2502, an image which has been displayed at the specified time can be acquired so 
as to be displayed on an acquired-image display portion 2503. The image 2500 is continuously reproduced even after 
the foregoing instruction has been performed with the button 2501. Since the acquired image is displayed on the 
acquired-image display portion 2503 for a while, instruction of an object which is being displayed in the acquired-image 

20 display portion 2503 enables related information of the specified object to be displayed. 

[0287] The button 2501 for acquiring an image may be omitted. A structure may be employed from which the button 
2501 is omitted and with which an image can automatically be acquired when the mouse cursor 2502 enters the video 
display portion 2500. 

[0288] A structure may be employed with which whether or not an object having related information exists in the 
25 frame is determined when the button 2501 has been depressed or the mouse cursor has entered the image region (for 
example, a determination is made by comparing the frame number and the leading frame numberArailing frame number 
with each other). Only when the object having related information exists in the frame, the image is acquired so as to be 
displayed. 

[0289] The foregoing method enables related information to easily be specified from a still image which is being dis- 

30 played on the acquired-image display portion 2503. 

[0290] The foregoing variations may be employed. Another method may be employed with which the region of an 
image which permits display or the like of related information is clearly displayed. Also a method may be employed with 
which instruction is facilitated. Thus, a variety of methods for aiding the operation of the user may be employed. 
[0291] A case in which the object region is expressed as characteristic points of the object similarly to the second 

35 embodiment will now be described. 

[0292] Portions different from those according to the first embodiment will mainly be described. 
[0293] A flowchart is, in the foregoing case, a flowchart which is basically similar to that shown in FIG. 33 except for 
characteristic points being employed as a substitute for the representative points. Specifically, the coordinates of char- 
acteristic points of the approximate figure are calculated in step S53. 

40 [0294] FIG. 34 shows the structure that the brightness in the approximate figure 1003 corresponding to the object 
1001 is intensified. If three or more characteristic points exist in the foregoing case, a polygon having the vertices which 
are the characteristic points may be formed. Moreover, the brightness of the inside portion of the polygon may be inten- 
sified. If two or more characteristic points exist, a figure of some kind may be formed which has the representative 
points which are the characteristic points. Moreover, the brightness in the figure may be intensified. Alternatively, a fig- 

45 ure, such as a circle, the center of which is each of the characteristic points and which has a somewhat large size is 
formed. Moreover, each of the formed figure, which must be displayed, is made conspicuous by means of brightness, 
color or blinking. 

[0295] The structure shown in FIG. 35 is arranged such that projection of the approximate figures corresponding to 
the objects 1601 and 1 602 in the vertical and horizontal directions is displayed as the bar set 1 605 and 1 607 or the bar 

so set 1 606 and 1 608. If three or more characteristic points exist in the foregoing case, a polygon having the vertices which 
are the characteristic points may be formed. Moreover, projection of the polygon in the directions of the two axes may 
be displayed as the bars. If two or more characteristic points exist, a rectangle having the vertices which are the char- 
acteristic points may be formed. Moreover, projection into the directions of the two axes may be displayed as the bars. 
If one characteristic point exists, the foregoing method shown in FIG. 36 may be employed with which the position of 

55 the centroid is displayed with circles in the bars. 

[0296] FIG. 38 shows the structure with which the image of an object is extracted by cutting in accordance with the 
approximate figure or the like so as to be displayed as an icon. Also in the foregoing case, the image of an object can 
be extracted by cutting in accordance with the characteristic points so as to be displayed as an icon. 
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[0297J FIG. 39 shows a structure that the approximate figures 1903 and 1904 are displayed in a map. Also in the 
foregoing case, a figure of some kind formed in accordance with characteristic points as described above may be dis- 
played as a map. 

[0298] The methods shown in FIGS. 37, 40 and 41 may employed in the foregoing case. 
5 [0299] The foregoing variations may be employed. Another method may be employed with which the region of an 
image which permits display or the like of related information is clearly performed. Also a method may be employed with 
which instruction is facilitated. Thus, a variety of methods for aiding the operation of the user may be employed. 

Fifth Embodiment 

[0300] A fifth embodiment will now be described with which an object in a video is retrieved. 
[0301] The block structure according to this embodiment is similar to that according to the third embodiment (see 
FIG. 29). Note that a structure shown in FIG. 29 may be arranged such that the related information display portion is 
omitted (for example, a system may be employed with which retrieval of an object is performed without use of related 
is information). Another structure from which the instruction input portion is omitted may be employed (for example, a 
structure may be employed with which the GUI is not used to instruct the retrieval). As a matter of course, also this 
embodiment can be realized by software. 

[0302] The third embodiment has the structure that the two focal points and one point on the ellipse are employed 

as the representative points when the ellipse is employed. A structure will now be described in which three vertices of 
so circumscribing rectangle of an ellipse are employed as the representative points of an ellipse. As a matter of course, 

the retrieval is permitted regardless of employment of the representative points. 

[0303] Note that the following symbols V, to V 4 , P, Q, F 1t F 2 , C 0 , T, U and C are vector quantities. 

[0304] Since the present invention is configured to describe the trajectory of the object region, estimation of points 

through which the object has passed and points through which the object has not passed enables the object to be esti- 
25 mated. For example, retrieval such as "retrieve vehicles which have passed through the center of this crossing and 

entered that traffic lane" or "retrieve vehicles which have entered the road from this position and which have not moved 

to this traffic lane" can be performed. 

[0305] FIGS. 42 and 43 show an example of the procedure for performing the foregoing retrieval. 
[0306] FIG. 42 shows an example of the procedure which is employed when a rectangle is employed to express an 
30 object. 

[0307] An assumption is made that point Q has been specified as the point through which the object has passed 
and has not passed. 

[0308] In step S100, time at which an object has appeared at time t is set. In step S101 , the coordinates of repre- 
sentative points V 1t V 2 and V 3 at certain time t are extracted. The coordinates are calculated as the values of spline 
35 functions at the corresponding time. The coordinates of the residual vertices can easily be obtained in accordance with 
the three vertices of the rectangle, as follows: 

v 4 *v 1 .v 2 + v 3 

40 [0309] In step S102, the values of four functions expressed by the following equations are obtained. 

fi(P) = (V 2y ~ V ly) x ( x * V 1x) '0*2x ' V 1x) x ( v " V 1y) 
MP) " &2y ' V 3y) * < x • V 2x) ■ Czx - V 3x) >< (V " V 2y ) 

45 

MP) = (Vav - V 4y) * (* ' V 3X ) - (V 3X - V 4X ) x (y - V 3y ) 
MP) - Ciy • V 4y) * (* ' V 4 x) - 0>ix - V 4x ) x (y - V 4y ) 

so where V, * (V^, V ly ) 

[0310] In step SI 03, it is determined whether or not the four obtained p = (x, y) functions satisfy the following rela- 
tionship: 

MQ) * M Q ) s 0 and MQ) * f 4 (Q) a 0 

55 

[0311] If the foregoing relationship is held, the object passes the specified point Q at time t. Therefore, it is deter- 
mined that the object passes through the point Q (step SI 04). If the relationship is not held, the object does not passes 
through the point Q at time t. Then, whether or not the object has passed through the point Q at another time is 
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detected. 

[0312] In step S105, it is determined whether or not detection of all of moments of time t has been performed by 
determining whether or nor time t is the same as time at which the object has disappeared from the screen. If the two 
moments of rime are the same, the process is completed and it is determined that the object has not passed through 
5 the point Q (step S1 07). If time t is earlier than time at which the object has disappeared, t is incremented by one in step 
S106. Then, the process from step S101 is repeated. 

[031 3] The foregoing process is performed for all of the objects which is to be retrieved so that objects which satisfy 
the retrieval condition can be retrieved. 

[0314] FIG. 43 shows an example of the procedure which is employed when an ellipse is employed to express an 
10 object. 

[031 5] In step S1 10, time at which the object has appeared at time t is set. 

[0316] In step S111 the coordinates of representative points V 1f V 2 and V 3 of the ellipse at certain time t are 
extracted. The representative points are the three vertices of the circumscribing rectangle of the ellipse which are suc- 
cessively and clockwise arranged in an order as V 1 , V 2 and V 3 . The calculation is performed by a process similar to that 
15 employed to process the rectangle. 

[0317] In step S1 12, a. and points F t and F 2 expressed by the following equations are obtained (Ft and F 2 are 
obtained as follows according to the relationship in the magnitude between a and b): 

20 

b = |V 2 -V 3 |/2 

F, =C 0 + e(V 2 -V,)/2 (whena>b) C 0 + e(V 2 - V 3 )/2 (when a £ b) 
25 F 2 =C 0 -efy 2 -V,)I2 (whena>b) C 0 - e(V 2 - V 3 )y2 (when a * b) 

where C 0 and e are as follows (e is determined in accordance with the relationship in the magnitude between a and b) 

C 0 = (V 1+ V 3 )/2 

30 

e«{V(a 2 -b 2 )}/a (whena>b) {V(b 2 -a 2 )}/b (whena^b) 

[0318] In step S113, it is determined whether or not the following conditions are satisfied (the conditions vary 
according to the relationship in the magnitude between a and b). 

35 

condition when a > b: 

|F 1 -Q| + |F 2 .Q|^2a 

40 condition when a ^ b: 

\f, -Q| + |F 2 'Q|s2b 

[0319] When the conditions are satisfied, the point Q exists in the ellipse at time t. Therefore, it is determined that 
45 the object has passed through the point Q and the process is completed (step S1 14). If the conditions are not satisfied, 
the point Q exists on the outside of the ellipse at time t. Therefore, a similar process is performed for other moments of 
timet. ' 

[0320] In step S1 15, it is determined as the completion condition whether or not time t is time at which the object 
has disappeared. If time t is time at which the object has disappeared, it is determined that the object has not passed 
so through the point Q. Thus, the process is completed (step S1 17). If time t is not time at which the object has disap- 
peared, t is incremented in step S1 16 and the process from step Sill is repeated. 

[0321] The foregoing process is performed for ail of the objects which is to be retrieved so that the objects which 
satisfy the retrieval conditions are retrieved. 

[0322] The foregoing process is arranged such that a fact whether or not the specified point is included in the 
55 approximate figure is employed as the criterion for making determination, A variety of criteria may be employed. For 
example, it may be determined that the object has passed the point when the specified point exists adjacent to the 
. approximate figure. Alternatively, it may be determined when the specified points are successively included in the 
approximate figure over a predetermined number of frames. 
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[0323] Also in a case where another figure is employed to express the shape of the object, a process corresponding 
to the figure is performed. Thus, objects which satisfy the retrieval conditions can be retrieved. 
[0324] When a plurality of points of passage or a plurality of point of non-passage have been specified, the forego- 
ing process is performed for all of the specified points. 
5 [0325] As a matter of course, one or more points of passage and one or more points of non-passage may be com- 
bined with each other. 

[0326] The retrieval can be performed by using the combinational logic for a plurality of points of passage and 
points of non-passage. For example, retrieval can be performed, for example, "retrieve objects which have passed 
through either of point a or and which have not passed through both of points £ and d". 

w [0327] The retrieval of the point of passage can be widened to a structure that time for which the object exists at the 
point of passage. The foregoing retrieval includes "retrieve persons which have done free browsing for 10 minutes or 
longer" and "retrieve persons who were in front of the cash dispenser for three minutes or longer". To perform the fore- 
going retrieval, time for which the object exists at the input position is measured. Then, only the objects which exist at 
the input position for time longer than time specified by the user are shown. 

15 [0328] Another example of the widened retrieval will now be described in which a condition in terms of the size (the 
area of the object) is added. 

[0329] When the shape of the object is expressed by a rectangle or an ellipse, the area of the object at certain time 
t can be calculated as follows: 

20 in the case of the rectangle, 

S R = |V 2 -V 1 |x|V 3 -V 2 | 

in the case of the ellipse, 

25 

S E = abn 

[0330] When the obtained value is used, retrieval can be performed by using a condition that, for example, the area 
is not smaller than S s nor larger than S L . For example, when "retrieve persons which walk on the road. Note that dogs 
30 and cats are not retrieved" is required, previous instruction of an area larger than that of the dogs and cats enables the 
retrieving accuracy to be improved. 

[0331 ] Another example of the retrieval will now be described with which objects which have moved through similar 
trajectories are retrieved. 

[0332] An assumption is made that the trajectories of a first object and a second object are T and U, respectively. 
35 Another assumption is made that time for which the first object exists and time for which the second object exists are 
Nt and N Ut respectively. An assumption is made that IM T s Nu in the foregoing case. Another assumption is made that 
time at which each of the objects has appeared is t = 0. The foregoing conditions can always be satisfied by changing 
T and U and by shifting the origin of the time axis. 

[0333] In the foregoing case, distance d(T, U) between T and U is defined as follows: 

40 

D(T, U) - min £ E 2 (T(j), U(j + i) ) 
i, j=0 

45 0 £ i £ N T - » 0 



[0334] The coordinates of T at time t is expressed as T (t) and E(P P Q) shows Euclidean distance. 
so [0335] By using the distance between the trajectories, the distance between the trajectory of the object specified by 
the user and the trajectory of another object is calculated for all of the other objects. Thus, the object exhibiting the 
shortest distance is displayed or the objects exhibiting the short distances are displayed by the number specified by the 
user. Thus, the objects which draw similar trajectories can be retrieved. 

[0336] Moreover, an object which draws a trajectory similar to a trajectory drawn by a user by an input device such 
55 as a mouse can be retrieved. In the foregoing case, the trajectory drawn by the user does not contain time information. 
Therefore, the direction between the trajectories must be calculated by a method distinct from d(T, U). Therefore, the 
distance d'(T, U) between the trajectory T and the trajectory U drawn by the user is calculated as follows: 
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N PU-1 

d'(T, U) = £ min E 2 (T(j), Ui) 

i 0 £ j g N X 



[0337] The trajectory drawn by the user is expressed by dot sequence Uj (0 £ i < N pu ). Note that N PU is the number 
w of the dot sequences. One or more objects exhibiting the short distance are displayed as objects each drawing the sim- 
ilar trajectory. Thus, retrieval can be performed. 

[0338] When the trajectory of the center of the object has been described, objects exhibiting short distance d(T, U) 
is retrieved such that the trajectories are T and U. When only information of a rectangle approximating the shape of the 
object or the trajectory of an ellipse can be obtained, the trajectory of the center is estimated. Then, the distance 
1$ between the trajectories of the objects is calculated. An estimated value of center C at certain time t is obtained from 
the coordinates V 1f V 2 and V 3 Of the vertices of the rectangle or the ellipse as follows: 

C%(V 1+ V 3 )72 

20 [0339] As a result of the estimation, similar trajectories can be retrieved from the trajectories of all of the objects. 
[0340] Although the example has been described in which the representative points of the approximate figure of the 
object region are employed, the present invention may be applied to a case where the characteristic points of the object 
region are employed similarly to the second embodiment. In the foregoing case, whether or not the object has passed 
through the specified point is determined in accordance with a fact whether or not the distance between the character- 
's istic point and the specified point is shorter than a reference value. 

[0341] The foregoing embodiments and structures may arbitrarily be combined with one another. 
[0342] Each of the foregoing structures may be realized by a recording medium storing a program for causing a 
computer to execute a predetermined means (or causing the computer to act as a predetermined means or causing the 
computer to realize a predetermined function). 
30 [0343] The present invention is conf igured such that the object region in a video is described as the parameter of a 
function approximating the trajectory obtained by arranging positional data of representative points of the approximate 
figure of the object region or the characteristic points of the object region in a direction in which frames proceed. There- 
fore, the region of a predetermined object can be described with a small quantity of data. Moreover, creation and han- 
dling of data can easily be performed. 
35 [0344] According to the present invention, a user is able to easily instruct an object in a video and determine the 
object. 

[0345] According to the present invention, retrieval of an object in a video can easily be performed. 
Claims 

40 

1 . A region data describing method for describing, over a plurality of frames, region data about a region of an arbitrary 
object existing in a video, the region data describing method characterized by comprising: 

extracting position data of a representative point of an approximate figure approximating the region or a char- 
45 acteristic point of the region from the plurality of frames; 

determining a function approximating a trajectory which links corresponding representative points or corre- 
sponding characteristic points of successive frames, the function being represented by a parameter: and 
describing the parameter of the function as the region data. 

so 2. The region data describing method according to claim 1 , characterized by further comprising describing informa- 
tion specifying a leading frame or a trailing frame of said plurality of frames as the region data. 

3. The region data describing method according to claim 2, characterized by further comprising describing informa- 
tion of the type of the approximate figure as the region data. 

55 

4. The region data describing method according to claim 2, characterized by further comprising describing informa- 
tion of the number of the approximate figure as the region data. 
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5. The region data describing method according to claim 1 , characterized in that the parameter includes position data 
of knots of the trajectory and information specifying the trajectory used together with position data of the knots of 
the trajectory. 

5 6. The region data describing method according to claim 1 , characterized in that 

a plurality of the representative points or the characteristic points are included in a certain frame, and 
the region data includes information specifying correspondence among a plurality of said representative points 
or characteristic points in the certain frame and a plurality of said representative points or characteristic points 
to in an adjacent frame. 

7. The region data describing method according to claim 1, characterized by further comprising describing related 
information related to the object or information indicating a method of accessing to the related information. 

15 8. A region data generating apparatus for generating region data about a region of an arbitrary object existing in a plu- 
rality of frames ol a video, the region data generating apparatus characterized by comprising: 

an extracting circuit (103, 233) configured to extract position data of a representative point of an approximate 
figure approximating the region or a characteristic point of the region from the plurality of frames; 
so a function determining circuit (104, 234) configured to determine a function approximating a trajectory which 

links corresponding representative points or corresponding characteristic points of successive frames, the 
function being represented by a parameter; and 

a describing circuit (106, 236) configured to describe the parameter of the function as the region data. 

25 9. The region data generating apparatus according to claim 8, characterized in that said describing circuit describes 
information specifying a leading frame or a trailing frame of said plurality of frames. 

10. The region data generating apparatus according to claim 9, characterized in that said describing circuit describes 
information of the type of the approximate figure. 

30 

11. The region data generating apparatus according to claim 9, characterized in that said describing circuit describes 
information of the number of the approximate figure. 

12. The region data generating apparatus according to claim 8, characterized in that the parameter includes position 
35 data of knots of the trajectory and information specifying the trajectory and used together with position data of the 

■ knots of the trajectory. 

13. The region data generating apparatus according to claim 8, characterized in that 

40 a plurality of the representative points or the characteristic points are included in a certain frame, and 

the region data includes information specifying correspondence among a plurality of said representative points 
or characteristic points in the certain frame and a plurality of said representative points or characteristic points 
in an adjacent frame. 

45 14. The region data generating apparatus according to claim 8, characterized in that said describing circuit describes 
related information related to the object or information indicating a method of accessing to the related information. 

1 5. A storing medium storing a computer program for describing, over a plurality of frames, region data about a region 
of an arbitrary object existing in a video, the computer program characterized by comprising: 

so 

a first program code of extracting position data of a representative point of an approximate figure approximating 
the region or a characteristic point of the region from the plurality of frames; 

a second program code of determining a function approximating a trajectory which links corresponding repre- 
sentative points or corresponding characteristic points of successive frames, the function being represented by 
55 a parameter; and 

a third program code of describing the parameter of the function. 

16. The storing medium according to claim 15, characterized in that said third program code describes information 
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specifying a leading frame or a trailing frame of said plurality of frames. 

17. The storing medium according to claim 16, characterized in that said third program cods describes information of 
the type of the approximate figure. 

5 

18. The storing medium according to claim 16, characterized in that said third program code describes information of 
the number of the approximate figure. 

19. The storing medium according to claim 15, characterized in that the parameter is position data of knots of the tra- 
10 jectory and information specifying the trajectory and used together with position data of the knots of the trajectory, 

20. The storing medium according to claim 15, characterized in that 

a plurality of the representative paints or the characteristic points are included in a certain frame, and 
is said third program code describes information specifying correspondence among a plurality of said represent- 

ative points or characteristic points in the certain frame and a plurality of said representative points or charac- 
teristic points in an adjacent frame. 

21. The storing medium according to claim 15, characterized in that said third program code describes related informa- 
20 tion related to the object or information indicating a method of accessing to the related information. 

22. The storing medium according to claim 15, characterized in that the region data comprises identification informa- 
tion of the object, information specifying a leading frame and a trailing frame of said plurality of frames, information 
related to the object, information indicating a method of accessing to the related information, information of the 

25 number of the approximate figure, and approximate figure information which includes information of the type of the 
approximate figure, number information of the representative point, and function data of a spline function approxi- 
mating the trajectories of the representative point which includes knot information, order information of the spline 
function, and coefficient information of the spline function. 

30 23. The storing medium according to claim 15, characterized in that the region data comprises identification informa- 
tion of the object, information specifying a leading frame and a trailing frame of said plurality of frames, related infor- 
mation related to the object, information indicating a method of accessing to the related information, and 
characteristic point information which includes information of the number of the characteristic point and function 
data of a spline function approximating the trajectories of the characteristic point which includes knot information, 

35 order information of the spline function, and coefficient information of the spline function. 
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